Gowtham Potureddi

Posted on Jun 10

SQL Data Analyst Jobs in 2026: Salary, Interviews & Top Hiring Markets

#python #sql #dataengineering #interview

sql data analyst jobs are the highest-volume, longest-tenured analytics role on the modern labour market — and 2026 is the year the title finally stopped looking like "spreadsheets with a database tab" and started looking like "the operating system of the business." The role has absorbed a decade of BI churn, an LLM revolution that promised to replace it (and did not), and a generation of analytics-engineer adjacent tooling that turned the median analyst into a half-engineer who still owns the metric.

This guide is the 2026 reality check on every part of the SQL analyst journey — what the market actually looks like, where the data analyst jobs are concentrated, what a fair offer looks like in dollars / pounds / euros / INR, what the interview loop is really testing, and a twelve-week application roadmap you can copy into Notion tomorrow. Each section pairs a market explainer with a worked interview-style answer — SQL code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why the answer holds up on FAANG, scale-up, and remote-first scorecards alike.

When you want hands-on reps immediately after reading, drill the aggregation practice library →, rehearse on window functions problems →, and stack the company-specific muscles with Meta SQL drills → and Amazon SQL drills →.

On this page

State of the SQL data analyst market in 2026
Top hiring markets — US, EU, India, remote
Salary bands by region and seniority
Anatomy of the SQL analyst interview loop
2026 application roadmap — CV, portfolio, prep, outreach
Cheat sheet — SQL analyst job-search recipes
Frequently asked questions
Practice on PipeCode

1. State of the SQL data analyst market in 2026

The analyst role survived the LLM wave by absorbing analytics engineering and stakeholder storytelling

The one-sentence invariant: in 2026, a SQL data analyst is a half-engineer, half-product partner who owns a metric end to end — from raw event log, through dbt model, into a Looker / Mode dashboard, into a stakeholder Slack message that ends with a decision. The job title has not changed since 2014; the job contract has expanded by roughly 40%.

Why the analyst job did not get automated away.

LLMs are great at writing SQL when the schema, the metric definition, and the business question are all crisp. They are catastrophic at the part of the job where the analyst negotiates which metric to compute, which edge cases to ignore, and which dashboard the VP will actually open.
The hardest part of analytics has always been the framing, not the keystrokes. "Is engagement down or did we change the definition?" is a 45-minute conversation, not a prompt. The job paid for that conversation in 2014 and it pays for it more in 2026.
Analytics engineering (dbt, Snowflake, BigQuery, Materialize) raised the floor on what every analyst is expected to ship. The median analyst now owns a dbt model and a metric layer, not just a saved query.
BI tooling churn (Tableau → Looker → Mode → Hex → Sigma → Omni) keeps the dashboard surface in flux, but SQL is the constant. Companies hire for SQL because the visualisation layer is six months from changing again.

The three flavours of "data analyst" in 2026.

Data Analyst (classic). Owns a metric, an exec dashboard, ad-hoc analyses. Lives in SQL + Looker / Mode + a deck. Reports to a product, marketing, or operations director. Most postings still use this title.
Analytics Engineer. Owns the dbt project, the metric layer, the data contract with engineering. Lives in SQL + dbt + Git + CI. Reports to a data engineering manager or head of data. Highest-growth title since 2022.
Business Intelligence Analyst. Owns the BI tool, the report library, the data literacy training. Lives in Looker / Tableau + SQL + change-management workshops. Often the most senior-facing of the three.

The "T-shaped analyst" expectation.

Vertical bar (deep). SQL — window functions, CTEs, NULL handling, query plans, dbt modelling, source-to-staging-to-mart layering, incremental tables, snapshots.
Horizontal bar (wide). Python for data wrangling and ad-hoc viz (pandas, plotly), one BI tool to expert level, basic Git, stakeholder communication, A/B test framing, funnel analysis, retention curves.
The mistake juniors make. Drilling SQL to the exclusion of everything else, then bombing the case study round where the interviewer asks "tell me what metric you'd pick and why."

What interviewers listen for in screens.

Do you say "my role was" and immediately pivot to a metric and an action it drove? — senior signal.
Do you reach for CTEs and window functions without prompting on a non-trivial probe? — required answer at most companies.
Do you mention dbt or the metric layer when asked how you'd modularise a metric definition? — strong signal for analytics-engineer-adjacent roles.
Do you recognise that dashboards no one opens are not analyst output? — senior signal that you understand the stakeholder side of the job.

The 2026 reality.

Demand is still strong — LinkedIn shows roughly 30,000–40,000 open data-analyst postings in the US alone at any given week in early 2026, with London + Berlin + Bengaluru adding another 12,000–18,000 in their respective markets.
The bar moved up. Junior postings that asked for "SQL + Excel" in 2020 now ask for "SQL + Python + dbt + Looker / Mode + one A/B test framing." The volume is the same; the skill density is higher.
AI tooling did not collapse the market — it shifted demand toward analysts who can audit an AI-generated SQL query, frame the question correctly, and tell the stakeholder the answer was wrong even when it looked right.
Remote-first roles account for ~18% of postings, down from a 2021 peak of 32% but stable since mid-2024. Hybrid (3-2 in office) is the most common policy.

Worked example — sizing your "metric ownership" claim

Detailed explanation. Recruiters and hiring managers screen analysts on a single signal in the first ten seconds of the resume: did this person own a metric, or did they only generate reports? The fastest way to fail the screen is to list dashboard names without saying what changed because of them. The fastest way to pass is a single bullet of the form "drove metric by pct via mechanism."

Question. You shipped a dashboard that tracks weekly active users for the mobile checkout team last quarter. It is currently described on your CV as "Built mobile checkout WAU dashboard." Rewrite the bullet in the impact format an interviewer expects — and write a SQL snippet that backs up the metric.

Input.

metric	before	after	mechanism
mobile checkout WAU	412,000	478,000	A/B-tested new payment flow + funnel cleanup

Code.

-- Metric definition the analyst now owns
WITH mobile_checkout_sessions AS (
    SELECT
        DATE_TRUNC('week', event_ts) AS week_start,
        user_id,
        platform,
        flow_variant
    FROM events
    WHERE event_name = 'checkout_completed'
      AND platform   = 'mobile'
      AND event_ts  >= DATE_TRUNC('quarter', CURRENT_DATE)
)
SELECT
    week_start,
    COUNT(DISTINCT user_id)                                            AS wau,
    COUNT(DISTINCT user_id) FILTER (WHERE flow_variant = 'new_flow')   AS wau_new_flow,
    COUNT(DISTINCT user_id) FILTER (WHERE flow_variant = 'control')    AS wau_control
FROM mobile_checkout_sessions
GROUP BY week_start
ORDER BY week_start;

Step-by-step explanation.

The CTE narrows the events table to "mobile checkout completions this quarter." Owning the metric means owning the definition — narrow first, then aggregate.
The outer query computes the headline WAU, plus split WAUs by experiment variant. That single change turns the dashboard from a vanity number into an A/B-test diagnostic.
FILTER (WHERE ...) lets the analyst count distinct users within each variant without joining the experiment-assignment table a second time. Modern dialects (Postgres / Snowflake / BigQuery) all support it.
The metric definition lives in version control — when the VP asks "why did WAU change?" the analyst can show the dbt model diff, not just a screenshot.

Output.

week_start	wau	wau_new_flow	wau_control
2026-01-05	412,103	0	412,103
2026-01-12	421,876	18,442	403,434
2026-01-19	443,219	64,118	379,101
2026-01-26	478,041	247,553	230,488

Rule of thumb. Every analyst CV bullet should read "drove metric direction by pct via mechanism (stack)." Anything shorter is dashboard-name padding; anything longer is bragging. One pattern, six bullets, two pages.

Worked example — auditing an AI-generated SQL query

Detailed explanation. Every analyst job in 2026 includes some version of "the PM pasted ChatGPT's SQL into Slack — does this look right?" Hiring managers probe this directly because it is the part of the job that LLMs cannot reliably do alone. The skill is reading SQL critically, naming the bug, and proposing the fix.

Question. A PM sends you the SQL below claiming it shows "conversion rate by signup country last week." Find the three bugs and rewrite it.

Input — a buggy AI-generated query.

SELECT
    country,
    COUNT(*) / COUNT(*) AS conversion_rate
FROM users u, orders o
WHERE u.id = o.user_id
  AND signup_date > '2026-01-01'
GROUP BY country;

Code (the fix).

WITH last_week_signups AS (
    SELECT id AS user_id, country
    FROM users
    WHERE signup_date >= DATE_TRUNC('week', CURRENT_DATE - INTERVAL '7 days')
      AND signup_date <  DATE_TRUNC('week', CURRENT_DATE)
),
last_week_conversions AS (
    SELECT DISTINCT user_id
    FROM orders
    WHERE order_date >= DATE_TRUNC('week', CURRENT_DATE - INTERVAL '7 days')
      AND order_date <  DATE_TRUNC('week', CURRENT_DATE)
      AND status = 'placed'
)
SELECT
    s.country,
    COUNT(*)                                       AS signups,
    COUNT(c.user_id)                               AS conversions,
    ROUND(100.0 * COUNT(c.user_id)
                / NULLIF(COUNT(*), 0), 2)          AS conversion_rate_pct
FROM last_week_signups s
LEFT JOIN last_week_conversions c USING (user_id)
GROUP BY s.country
ORDER BY conversion_rate_pct DESC NULLS LAST;

Step-by-step explanation.

Bug 1 — COUNT(*) / COUNT(*) always equals 1. The PM (or the LLM) wanted converted-users / signups, not the same count divided by itself. The fix uses two CTEs and counts each population independently.
Bug 2 — implicit join (users u, orders o) silently turned an INNER JOIN into a Cartesian-filtered scan. Modern code review blocks implicit joins; switch to LEFT JOIN ... USING (user_id) so users without orders still appear in the denominator.
Bug 3 — signup_date > '2026-01-01' is not "last week." It is "everything since New Year." The fix uses an explicit ISO-week window with both inclusive lower and exclusive upper bounds.
The NULLIF(COUNT(*), 0) is a defensive division — if a country had zero signups in the window it would otherwise raise a div-by-zero error and crash the dashboard refresh.

Output (toy data).

country	signups	conversions	conversion_rate_pct
US	12,041	3,210	26.66
GB	4,210	1,011	24.01
DE	3,118	612	19.63
FR	1,802	281	15.59

Rule of thumb. Treat every AI-generated SQL query like a junior pull request — read for intent first (does the metric definition match the business question?), then correctness (do the joins, filters, and date windows line up?), then defensive hygiene (NULL handling, divide-by-zero, time-zone consistency).

Worked example — picking the right title to apply for

Detailed explanation. A senior analyst with three years of dbt experience is debating between "Data Analyst," "Analytics Engineer," and "Senior Data Analyst" postings. The titles overlap heavily; the salary bands do not. Picking the right title at the resume-keyword stage is half the battle.

Question. Given your current responsibilities — owning two dbt models, building exec dashboards in Looker, occasionally writing PySpark for data quality — which posting title should you target, and why?

Input — three job postings.

title	required	nice to have	salary band (NYC mid-level)
Data Analyst	SQL + viz + stakeholder management	Python	$90-115K
Senior Data Analyst	SQL + viz + experimentation + stakeholder	dbt + Python	$115-145K
Analytics Engineer	SQL + dbt + Git + CI + data modelling	Python + Spark	$135-170K

Code. No SQL — this is a positioning exercise. The "code" is the resume bullet rewrite:

Before (generic): "Built Looker dashboards and wrote SQL."

After (Analytics Engineer): "Own 2 dbt models powering exec KPIs;
wrote 14 dbt tests catching 99.6% of data-quality regressions before
production; cut metric-refresh latency 38% via incremental models."

Step-by-step explanation.

The "Data Analyst" posting is too junior — the candidate has dbt experience and is undersold by the title.
The "Senior Data Analyst" posting is the safest match — every required skill is present, and the salary band is 30% above the junior posting.
The "Analytics Engineer" posting pays the most but requires explicit data-modelling and CI experience. If the candidate ships dbt tests + Git PRs + has any data-modelling story, apply.
The pragmatic answer is to apply to all three at different companies — the title is a recruiter-keyword filter, not a destiny. Tailor the top resume bullet to the posting; keep the rest of the CV stable.

Output.

posting	apply?	tailoring
Data Analyst	maybe (only at top-tier brand)	de-emphasise dbt
Senior Data Analyst	yes (highest ROI)	lead with metric ownership
Analytics Engineer	yes (highest ceiling)	lead with dbt tests + CI

Rule of thumb. Pick the title where you match 70%+ of the required column and 30%+ of the nice-to-have column. Below 70% required → recruiters reject; above 95% required → you are underselling. The sweet spot is "stretch but believable."

SQL interview question on market signal detection

A senior interviewer often opens with: "Tell me how you'd measure whether the data-analyst job market is heating up or cooling down — what data would you pull, and what SQL would you write against it?" It blends product sense, market literacy, and SQL — exactly the T-shape they want to probe.

Solution Using a job-post telemetry funnel

-- Assume a job_posts table refreshed daily from LinkedIn + Indeed signals
WITH weekly_posts AS (
    SELECT
        DATE_TRUNC('week', posted_at)             AS week_start,
        country,
        title_category,
        COUNT(*)                                  AS posts
    FROM job_posts
    WHERE title_category = 'data_analyst'
      AND posted_at >= CURRENT_DATE - INTERVAL '12 weeks'
    GROUP BY 1, 2, 3
)
SELECT
    week_start,
    country,
    posts,
    LAG(posts) OVER (PARTITION BY country ORDER BY week_start) AS prev_posts,
    ROUND(
        100.0 * (posts - LAG(posts) OVER (PARTITION BY country ORDER BY week_start))
             / NULLIF(LAG(posts) OVER (PARTITION BY country ORDER BY week_start), 0),
        2
    ) AS wow_pct_change
FROM weekly_posts
ORDER BY country, week_start;

Step-by-step trace.

week_start	country	posts	prev_posts	wow_pct_change
2026-03-30	US	38,201	NULL	NULL
2026-04-06	US	39,118	38,201	2.40
2026-04-13	US	41,002	39,118	4.82
2026-04-20	US	42,317	41,002	3.21
2026-04-27	US	42,580	42,317	0.62

The trace shows a steady upward drift in US data-analyst postings — a "heating" signal. A 12-week rolling view smooths weekly noise and surfaces inflexion points the recruiter would otherwise miss.

Output:

country	trend (12wk)	call
US	+11% net	heating — apply broadly
UK	+3% net	flat — apply to top targets
DE	-2% net	cooling — diversify markets
IN	+18% net	heating — Bengaluru / Hyderabad load up

Why this works — concept by concept:

DATE_TRUNC + GROUP BY — the canonical SQL pattern for "weekly counts." Keeps the time axis clean and the metric definition obvious.
LAG over PARTITION BY — gives you "previous value within country" in one window pass. Far cheaper than a self-join and far clearer in code review.
NULLIF on the denominator — guards against divide-by-zero on the first week (where LAG is NULL) and on countries with sudden zero-post weeks. A two-character hygiene win.
Rounded percentage outputs — analysts who ship raw doubles into dashboards look junior; rounding to 2 decimals is the polite default.
12-week rolling window — long enough to dampen single-week noise, short enough to spot quarterly seasonality. The market-research equivalent of a moving-average crossover.
Cost — O(rows × 1) for the window aggregate; identical plan to the equivalent CTE-and-self-join, just half the lines of SQL.

SQL
Topic — window functions
Window functions problems (SQL)

Practice →

2. Top hiring markets — US, EU, India, remote

Five metros plus one global remote segment hold 80% of the SQL data analyst job posts in 2026

The mental model in one line: SQL data analyst jobs concentrate where the data infrastructure budget concentrates — Big Tech HQs, fintech corridors, healthcare hubs, and the GCC belt — and the remote-first slice is real but smaller than the 2021 narrative. Once you know which five metros dominate, you can target your job search instead of spraying.

US metros leading SQL data analyst job posts.

New York Metro — the index leader, driven by fintech (Bloomberg, Citadel, Two Sigma, JPMC), media (NYT, Disney Streaming), and the Manhattan-based Big Tech offices (Meta, Google, Amazon). Salary ceilings are highest, COL premium is highest, and the analyst-to-engineer culture is the most mature.
SF Bay Area — the second-largest market but with the highest pay per posting. Pure tech (Meta, Google, Stripe, Airbnb, Pinterest, Plaid, Notion) plus enterprise-AI (OpenAI, Anthropic, Scale, Glean). Total compensation skews toward equity.
Seattle — Amazon + Microsoft anchor the market; smaller but growing presence from Stripe, Snowflake (Seattle hub), and DoorDash. Salary bands lag SF by ~10% but COL lags by ~25% — strong net-comp story.
Austin — the "second SF" narrative is overstated, but Tesla, Indeed, Dell, and a wave of fintech / AI scale-ups (Anaconda, Capital Factory portfolio) keep it healthy. Salary bands roughly 80% of NYC.
Chicago — Bloomberg, Morningstar, Salesforce (post-Slack), Uber Freight, and a deep base of healthcare + insurance analyst roles. Pay is competitive with Austin; cost of living is better.

EU hubs.

London — Europe's largest analyst market, with HSBC, Barclays, Revolut, Monzo, Wise, Deliveroo, and the European HQs of Meta / Google / Stripe / Bloomberg. Strong post-Brexit demand. Hybrid is the default; pure-remote is uncommon.
Berlin — Zalando, N26, GetYourGuide, Babbel, Delivery Hero, Klarna's Berlin team. Lower pay than London but lower tax + lower COL. Visa pathway via the EU Blue Card is friendly for non-EU candidates with degree + offer + salary above threshold.
Amsterdam — Booking.com, Adyen, ING, Picnic, Mollie, and the Netherlands HQs of Uber EMEA and Tesla EMEA. The 30% expat tax ruling sweetens net comp materially.
Dublin — European HQs of Meta, Google, LinkedIn, Stripe, Airbnb. Hiring volume tracks Big Tech headcount cycles; pay is the highest in the EU outside London.

India centres.

Bengaluru — the biggest analyst market in Asia. Global capability centres (GCCs) for Walmart, Target, Lowe's, Wells Fargo, Cisco, Goldman Sachs anchor demand; product companies (Flipkart, Razorpay, CRED, Swiggy, Zomato, Meesho) push the ceiling.
Hyderabad — Microsoft, Amazon (huge), Salesforce, Google, and a deep base of pharma + fintech GCCs. Pay is roughly 5-10% lower than Bengaluru but the demand is more stable.
Pune — strong on enterprise / banking GCCs (JPMC, Barclays, Deutsche Bank). Demand grew 14% YoY in 2025.

Remote-first companies.

GitLab — fully remote, no offices, English-speaking. Strong analyst pipeline; competitive comp by US-equivalent bands.
Automattic — same model; pay is location-adjusted but generous.
Zapier — fully remote, strong analytics team, friendly to mid-level hires.
Dbt Labs — fully remote, the centre of the analytics-engineering universe. Bar is high.
Hashicorp — fully remote since 2020; analyst team supports product + go-to-market.
Posthog — small but elite; remote, transparent salary calculator.
Buffer, Doist, Toggl, Toptal, Close, Hotjar — the wider tier of remote-friendly companies hiring analysts every quarter.

Hybrid policy trends.

3-2 in-office (three days in, two remote) is now the default for most analyst teams at Big Tech and scale-ups.
Full RTO (5 days in office) is mostly limited to JPMC, Goldman Sachs, Tesla, and a few Big Tech sub-orgs. The labour market reacted by routing analyst talent away from them when better-paying hybrid roles existed.
Fully remote held steady at 18% of postings — down from the 2021 peak, but stable. Most are scale-up or remote-native companies.

Visa-friendly companies and TN / H1B trends for 2026.

The H1B lottery remains the chokepoint for non-US candidates. Top sponsors in 2025 included Amazon, Microsoft, Google, Meta, Deloitte, Cognizant, Infosys.
L1 transfers (intra-company) bypass the lottery if you can land at a multinational with an Indian or European office first.
TN visa (Canada / Mexico) is the cleanest pathway for Canadian / Mexican citizens — same-day adjudication at the border, renewable indefinitely. The job description must explicitly map to a NAFTA-listed profession; "Computer Systems Analyst" works for analyst roles when scoped correctly.
UK Skilled Worker visa is reliable but expensive; the 2024 salary-threshold hike pushed analyst roles closer to the cutoff.
Germany Blue Card has the lowest friction in Europe — three years to permanent residency for non-EU professionals.

Where the highest analyst job growth is.

Fintech — 28% of postings, led by Stripe, Plaid, Wise, Revolut, Brex, Mercury, Modern Treasury.
Healthcare — 17%, led by UnitedHealth Optum, Komodo Health, Tempus, Flatiron Health.
Climate / energy — 11%, growing fastest; Watershed, Climavision, Sunrun, Tesla Energy, Octopus Energy.
AI tooling & infra — 19%, growing nearly as fast; OpenAI, Anthropic, Scale, Hugging Face, Weights & Biases, Modal, Replicate.

Worked example — narrowing your target metro list

Detailed explanation. A common mistake is "I'll apply everywhere." A better strategy is to pick three metros, two sectors, and one remote-first lane based on your runway, visa status, and lifestyle constraints. The SQL below quantifies the trade-off.

Question. Given the job_market table (postings + median salary by metro by sector), find the three metros where your "must-have salary" of $130K base is most likely to be met, weighted by posting volume.

Input.

metro	sector	postings	median_salary_usd
New York	fintech	8,400	142,000
SF Bay	AI tooling	6,200	168,000
London	fintech	4,800	92,000
Berlin	climate	2,100	78,000
Bengaluru	product	5,600	32,000
Remote-first	analytics-eng	3,400	138,000

Code.

WITH ranked AS (
    SELECT
        metro,
        sector,
        postings,
        median_salary_usd,
        CASE WHEN median_salary_usd >= 130000 THEN 1 ELSE 0 END AS meets_floor,
        postings * CASE WHEN median_salary_usd >= 130000 THEN 1 ELSE 0 END
            AS weighted_score
    FROM job_market
)
SELECT
    metro,
    sector,
    postings,
    median_salary_usd,
    weighted_score
FROM ranked
WHERE meets_floor = 1
ORDER BY weighted_score DESC
LIMIT 3;

Step-by-step explanation.

The CTE flags each row "meets floor" if median salary clears $130K. CASE WHEN is portable across every dialect.
The weighted_score multiplies posting volume by the floor flag — zero for rows below the floor, full volume for rows above. That bakes "how often will this metro pay me" into a single ranking number.
Filter to meets_floor = 1 to drop sub-threshold rows entirely; ORDER BY weighted_score DESC and LIMIT 3 surfaces the strongest three options.
The output is your shortlist: three metros + sectors where the salary floor is plausible and the posting volume justifies the application effort.

Output.

metro	sector	postings	median_salary_usd	weighted_score
New York	fintech	8,400	142,000	8,400
SF Bay	AI tooling	6,200	168,000	6,200
Remote-first	analytics-eng	3,400	138,000	3,400

Rule of thumb. Pick three metros, two sectors, one remote-first lane. Six concurrent application channels is the limit before you stop tailoring; below three, you under-sample the market.

Worked example — remote vs hybrid trade-off ranking

Detailed explanation. Remote-first sounds attractive but lowers your effective comp at the 90th-percentile end (Big Tech equity is harder to access remotely). Hybrid splits the difference. The decision should be quantified, not vibes.

Question. Given five candidate postings, rank them by adjusted total compensation — base + bonus + equity, minus a "commute tax" of $1K per in-office day per month, $12K per year.

Input.

posting	base	bonus	equity_annual	days_in_office	adjusted_tc
FAANG NYC (5d)	145,000	22,000	90,000	5	?
Scale-up Bay (3d)	135,000	15,000	65,000	3	?
Hybrid London (2d)	90,000	10,000	25,000	2	?
Remote-first US	140,000	12,000	20,000	0	?
Remote-first EU	95,000	8,000	12,000	0	?

Code.

SELECT
    posting,
    base,
    bonus,
    equity_annual,
    days_in_office,
    base + bonus + equity_annual
        - (days_in_office * 12000 / 5)             AS adjusted_tc
FROM offers
ORDER BY adjusted_tc DESC;

Step-by-step explanation.

The base TC formula is base + bonus + equity_annual. Equity is annualised — divide multi-year grants by the vest period.
The "commute tax" is $12K per year for a 5-day-a-week in-office role; pro-rated by days_in_office / 5. A 3-day role pays $7,200; a 2-day role pays $4,800; a remote role pays zero.
The SQL subtracts the commute tax from raw TC to give an "adjusted" number that reflects lifestyle cost. Real analysts pick a higher tax for their own constraints (childcare, partner job, climate).
The ranking now favours remote-first and hybrid roles unless the FAANG equity premium is large enough to outweigh the commute cost.

Output.

posting	adjusted_tc
FAANG NYC (5d)	245,000
Scale-up Bay (3d)	207,800
Remote-first US	172,000
Hybrid London (2d)	120,200
Remote-first EU	115,000

Rule of thumb. Make every offer comparison apples-to-apples by adjusting for commute, COL, equity vest schedule, and tax. The raw "base + bonus" comparison undersells remote roles for analysts with anchor constraints.

Worked example — sector growth detection

Detailed explanation. Targeting a growing sector beats targeting a flat sector at the same salary. A junior analyst who joined a climate startup in 2023 had 4 promotions of optionality in 2024-25; a peer who joined a flat retail BI team had one.

Question. Given quarterly posting counts by sector for the last six quarters, find the three fastest-growing sectors by compound quarterly growth rate.

Input.

sector	q1	q2	q3	q4	q5	q6
fintech	6,400	6,800	7,300	7,800	8,200	8,400
healthcare	4,100	4,200	4,400	4,800	4,900	5,000
climate	900	1,200	1,500	1,800	2,000	2,100
ai_tooling	3,400	4,100	4,900	5,500	5,900	6,200
classic_retail	7,800	7,400	7,200	7,000	6,800	6,500

Code.

WITH growth AS (
    SELECT
        sector,
        q1, q6,
        POWER(CAST(q6 AS DOUBLE) / NULLIF(q1, 0), 1.0/5) - 1 AS cqgr
    FROM sector_postings
)
SELECT
    sector,
    q1,
    q6,
    ROUND(100.0 * cqgr, 2) AS cqgr_pct
FROM growth
ORDER BY cqgr DESC
LIMIT 3;

Step-by-step explanation.

CQGR (compound quarterly growth rate) is (q6 / q1) ^ (1/5) - 1 — five compounding intervals between six observations. A single number per sector.
NULLIF(q1, 0) guards against any sector that opened the window with zero postings (would divide by zero otherwise).
The cast to DOUBLE ensures integer division does not silently truncate the ratio.
ORDER BY CQGR DESC + LIMIT 3 surfaces the three fastest-growing sectors. The "classic_retail" row has a negative growth — it would not pass the LIMIT cut.

Output.

sector	q1	q6	cqgr_pct
climate	900	2,100	18.49
ai_tooling	3,400	6,200	12.78
fintech	6,400	8,400	5.59

Rule of thumb. Among comparable salary bands, the sector with the highest CQGR over the last six quarters wins on optionality. Healthcare and analytics-engineer-adjacent are quiet compounders; climate and AI tooling are the loud ones.

SQL interview question on top-N market ranking

A senior interviewer often frames this as: "Given a job_postings table with metro, posted_at, salary_usd, and sector columns, write a query that returns the top 3 metros by job-posting volume for each sector in the last quarter. This is the SQL version of the targeting exercise above."

Solution Using `ROW_NUMBER()` window with `PARTITION BY`

WITH per_sector AS (
    SELECT
        sector,
        metro,
        COUNT(*) AS postings
    FROM job_postings
    WHERE posted_at >= DATE_TRUNC('quarter', CURRENT_DATE - INTERVAL '3 months')
      AND posted_at <  DATE_TRUNC('quarter', CURRENT_DATE)
    GROUP BY sector, metro
),
ranked AS (
    SELECT
        sector,
        metro,
        postings,
        ROW_NUMBER() OVER (PARTITION BY sector ORDER BY postings DESC) AS rn
    FROM per_sector
)
SELECT sector, metro, postings
FROM ranked
WHERE rn <= 3
ORDER BY sector, postings DESC;

Step-by-step trace.

sector	metro	postings	rn
fintech	New York	8,400	1
fintech	London	4,800	2
fintech	SF Bay	3,900	3
fintech	Singapore	1,800	4 (filtered)
ai_tooling	SF Bay	6,200	1
ai_tooling	New York	2,100	2
ai_tooling	Remote-first	1,600	3

ROW_NUMBER() resets at each new sector partition, so each row gets a within-sector rank. The outer query keeps only rows with rank ≤ 3, giving exactly three metros per sector.

Output:

sector	metro	postings
ai_tooling	SF Bay	6,200
ai_tooling	New York	2,100
ai_tooling	Remote-first	1,600
fintech	New York	8,400
fintech	London	4,800
fintech	SF Bay	3,900

Why this works — concept by concept:

PARTITION BY sector — restarts the ranking inside each sector group. Without it, you'd get a global top-3 dominated by the largest sector.
ROW_NUMBER vs RANK vs DENSE_RANK — ROW_NUMBER assigns 1, 2, 3 with no ties. If two metros had identical posting counts, RANK would skip the next number (1, 1, 3); DENSE_RANK would not skip (1, 1, 2). For a "top 3" filter, ROW_NUMBER is almost always what you want.
The two-CTE pattern — separate "compute the metric" from "rank the metric." Each CTE is short, debuggable, and reads top-to-bottom in a code review.
Quarter-aligned date window — DATE_TRUNC('quarter', ...) aligns to the start of the quarter; the inclusive lower + exclusive upper bound idiom avoids fence-post bugs on quarter boundaries.
Cost — O(rows) for the GROUP BY, O(rows × log(partition_size)) for ROW_NUMBER. Both fit a single query plan; no self-joins required.

SQL
Topic — aggregation
Aggregation problems (SQL)

Practice →

3. Salary bands by region and seniority

Base salary tells half the story — total compensation, equity vest, and cost-of-living-adjusted net pay tell the rest

The mental model in one line: a $145K NYC offer and a £75K London offer and an INR 32 LPA Bengaluru offer can be the same job, the same level, and the same employer — the salary band is denominated in the local market's COL, the local tax code, and the local equity culture. Once you stop comparing raw base across regions, the analyst comp landscape becomes legible.

US base-salary ranges (2026).

Junior (0-2 yrs) — $70-95K base in NYC / SF Bay; $60-80K in Austin / Chicago / Atlanta. Bonus 5-10%, equity minimal at non-FAANG, ~$10K/yr at FAANG.
Mid-level (3-5 yrs) — $100-145K base in NYC / SF Bay; $90-125K in tier-2 metros. Bonus 10-15%, equity $25-50K/yr at non-FAANG, $40-90K/yr at FAANG.
Senior (5-8 yrs) — $145-200K+ base in NYC / SF Bay (FAANG / unicorn); $140-180K in tier-2 metros. Bonus 15-20%, equity $60-150K/yr at FAANG, lower at non-FAANG.
Staff / Principal (8+ yrs) — $200-260K+ base at FAANG / unicorn; total comp into the $400-600K band with full equity vest.

US tier-2 metros (Austin / Chicago / Atlanta / Boston).

Bands roughly 80% of NYC/Bay Area at the base level.
Bonus and equity bands are usually identical to NYC for the same employer — only base differs.
Net comp after tax + COL often exceeds NYC for mid-level analysts; the tax-and-rent wedge is significant.

EU bands.

London — Junior £38-52K, Mid £60-90K, Senior £100-135K base. Bonus 10-25%. RSU access for US-listed employers is the wedge that makes London competitive with NYC at senior.
Berlin — Junior €45-60K, Mid €65-95K, Senior €100-130K base. Bonus 10-15%. RSU access at unicorns / FAANG. Tax burden higher than US, COL lower.
Amsterdam — Junior €50-65K, Mid €70-100K, Senior €110-145K. The 30% expat tax ruling boosts net comp by ~20% for qualifying internationals.
Dublin — bands track London closely; tax is friendlier on the lower bands, slightly higher on the upper bands.

India bands.

Junior (0-2 yrs) — INR 8-14 LPA at product companies and top GCCs; INR 5-9 LPA at services and tier-2 GCCs.
Mid (3-5 yrs) — INR 18-32 LPA at product / FAANG India; INR 14-22 LPA at standard GCCs. Bonus 10-15%, RSU access at FAANG / unicorn.
Senior (5-8 yrs) — INR 40-65 LPA at product / FAANG India; INR 30-45 LPA at standard GCCs. RSU access starts to dominate total comp.
Staff / Principal — INR 75 LPA-1.5 Cr+ at the top end; usually requires a strong specialty (dbt + Snowflake + Looker, or Spark + Kafka).

Bonus and equity splits.

FAANG / MAANG — Base 60% / bonus 10% / equity 30%. Equity dominates at mid-senior and above. RSU vest is typically 4-year cliff-free with quarterly vesting.
Unicorn (late-stage private) — Base 70% / bonus 10% / equity 20%. Equity is options (RSAs at very late stage); IPO-or-secondary timing is the lottery ticket.
Growth-stage startup (Series B-D) — Base 80% / bonus 5% / equity 15%. Equity is options at strike; outcome distribution is heavy-tailed.
Seed-Series A — Base 90% / bonus 0% / equity 10%. Equity is options; outcome distribution is mostly zero.
Public non-FAANG — Base 75% / bonus 15% / equity 10%. Comp is predictable but the equity premium is modest.

Specialty premiums.

dbt + Snowflake + Looker stack — +10-15% over generic SQL analyst. The skill bundle is in high enough demand that recruiters tag it explicitly.
A/B testing fluency (frequentist + Bayesian) — +5-10%, especially at scale-ups with product-data teams.
PySpark + Databricks — +10-15%, especially in India where pyspark is the GCC norm.
Looker / LookML deep ownership — +5-10% at companies with mature semantic layers.

Salary growth signals.

TC bumps by company tier. FAANG → +5-8% YoY for in-band performers, +20-30% for promoted to mid-senior. Unicorns → +10-15% YoY; aggressive at private valuations holding firm.
YoE acceleration. The 3-year and 5-year marks are the biggest jumps; the 8-year mark hits a soft ceiling unless you specialise into analytics-engineering manager or principal analyst.
Geo arbitrage. Moving from a tier-2 US metro to NYC at the same level is typically +25-35% base; the reverse is -15-20%. Plan accordingly.

Worked example — translating a remote offer into your local-comp frame

Detailed explanation. A remote-first US company offers $135K base + 12% bonus + $20K/yr options. You live in Berlin. The dollar offer must be re-framed in euros, after tax, after COL, to compare against a Berlin-based €85K hybrid offer.

Question. Compute the after-tax net annual comp for both offers and pick the one that wins on cash. Assume Berlin tax of 42% effective on €85K, US-source income via Germany of 35% effective (treaty + Berlin freelancer), COL parity.

Input.

offer	base_local	bonus_pct	equity_annual_local	tax_rate
Remote US (paid in EUR via PEO)	122,000 EUR	12	18,200 EUR	0.35
Hybrid Berlin	85,000 EUR	10	6,000 EUR	0.42

Code.

SELECT
    offer,
    base_local,
    base_local * bonus_pct / 100.0                      AS bonus,
    equity_annual_local,
    (base_local + (base_local * bonus_pct / 100.0)
                + equity_annual_local) * (1 - tax_rate) AS net_tc_eur
FROM offer_comparison
ORDER BY net_tc_eur DESC;

Step-by-step explanation.

Base comp converted into EUR at the spot rate first — the SQL assumes the conversion has already happened.
Bonus computed as a percentage of base. Equity already in EUR-equivalent (annualised RSU / option-share value).
Multiplying by (1 - tax_rate) gives net comp in euros after the relevant effective tax burden. This is the number that matters for spending power, not gross.
ORDER BY net_tc_eur DESC ranks the offers fairly.

Output.

offer	bonus	net_tc_eur
Remote US (paid in EUR via PEO)	14,640	100,724
Hybrid Berlin	8,500	57,710

Rule of thumb. Compare offers in net local currency after tax, after employer-side benefits (pension match, health cover), after commute / housing arbitrage. The raw gross base number is almost always misleading across countries.

Worked example — total-compensation projection over a 4-year vest

Detailed explanation. Equity is often quoted as a single 4-year grant value. The real cash flow back-loads heavily at FAANG (graded vest 25/25/25/25 with cliff) and even more at unicorns (cliff plus heavier later vest). Project the year-by-year TC so the offer comparison is honest.

Question. A FAANG mid-level analyst is offered $145K base + 15% bonus + $200K RSU over 4 years (25% per year). Project annual TC for each of the four years and the cumulative four-year total.

Input.

year	base	bonus_pct	rsu_vest_pct
1	145,000	15	25
2	145,000	15	25
3	145,000	15	25
4	145,000	15	25

Code.

WITH rsu_total AS (SELECT 200000 AS total_rsu),
projection AS (
    SELECT
        y.year,
        y.base,
        y.base * y.bonus_pct / 100.0          AS bonus,
        r.total_rsu * y.rsu_vest_pct / 100.0  AS rsu_this_year,
        y.base
            + y.base * y.bonus_pct / 100.0
            + r.total_rsu * y.rsu_vest_pct / 100.0  AS tc
    FROM offer_year y
    CROSS JOIN rsu_total r
)
SELECT
    year,
    base,
    bonus,
    rsu_this_year,
    tc,
    SUM(tc) OVER (ORDER BY year ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        AS cumulative_tc
FROM projection
ORDER BY year;

Step-by-step explanation.

The CTE materialises the total RSU grant as a single-row table — cheap reuse trick.
The projection multiplies bonus and RSU by their respective percentages of base / total, summing into a per-year TC.
The window SUM(tc) OVER (ORDER BY year ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) gives a running cumulative TC — the "four-year total so far" number that's most useful for stay-or-jump decisions.
Real-world refinements add merit increases, refresh grants, stock-price drift; this skeleton is the honest baseline.

Output.

year	base	bonus	rsu_this_year	tc	cumulative_tc
1	145,000	21,750	50,000	216,750	216,750
2	145,000	21,750	50,000	216,750	433,500
3	145,000	21,750	50,000	216,750	650,250
4	145,000	21,750	50,000	216,750	867,000

Rule of thumb. Convert every offer into a year-by-year TC projection. Compare cumulative four-year totals after tax, after refresh-grant assumptions, after a realistic stock-price scenario. Sticker price hides a lot.

Worked example — negotiation leverage from a benchmark query

Detailed explanation. The strongest negotiation move is "based on market data, the band for this level in this metro is X — here is my expectation." The SQL behind that move is unglamorous but powerful — pull peer-source data (Levels.fyi public dataset, internal comp tools), filter to your level + metro + company tier, and quote the median.

Question. Given a levels_fyi table with public submissions, find the median and 75th-percentile TC for "Senior Data Analyst" in "New York Metro" at "FAANG"-tier employers in the last twelve months.

Input. Assume levels_fyi table with rows (submitted_at, title, metro, tier, tc_usd).

Code.

SELECT
    title,
    metro,
    tier,
    COUNT(*)                                           AS submissions,
    PERCENTILE_CONT(0.5)  WITHIN GROUP (ORDER BY tc_usd) AS p50_tc,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY tc_usd) AS p75_tc
FROM levels_fyi
WHERE title = 'Senior Data Analyst'
  AND metro = 'New York Metro'
  AND tier  = 'FAANG'
  AND submitted_at >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY title, metro, tier;

Step-by-step explanation.

The WHERE narrows submissions to the relevant cell — same level, same metro, same employer tier, recent enough to reflect current bands.
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY tc_usd) is the ANSI median (continuous interpolation). Postgres, Snowflake, BigQuery all support it.
The 75th percentile is the "ambitious but defensible" anchor — quoting the median is the floor of your ask; quoting the 75th is the ceiling.
The COUNT column is the credibility check — if the cell has fewer than ~20 submissions, the medians are noisy and the leverage is weaker.

Output.

title	metro	tier	submissions	p50_tc	p75_tc
Senior Data Analyst	New York Metro	FAANG	84	245,000	282,000

Rule of thumb. Open the negotiation with "Based on 84 submissions in the last twelve months for this exact level / metro / tier, the median is $245K and the 75th percentile is $282K. My expectation lands between those numbers." Specificity is leverage; vague asks get vague offers.

SQL interview question on salary-band normalisation

A senior interviewer might frame this as: "We have analyst offers from NYC, London, and Bengaluru. Each offer is in its local currency. Write a query that converts every offer to USD, normalises by COL index, and ranks them by 'effective comp.'"

Solution Using a currency-and-COL adjustment join

WITH offers AS (
    SELECT 'New York'  AS metro, 145000 AS base_local, 'USD' AS ccy UNION ALL
    SELECT 'London',         95000,           'GBP' UNION ALL
    SELECT 'Bengaluru',   3200000,            'INR'
),
fx AS (
    SELECT 'USD' AS ccy, 1.00     AS to_usd UNION ALL
    SELECT 'GBP',         1.28           UNION ALL
    SELECT 'INR',         0.012
),
col_index AS (
    -- normalised so NYC = 100; <100 means cheaper to live there
    SELECT 'New York'  AS metro, 100 AS col_idx UNION ALL
    SELECT 'London',                85         UNION ALL
    SELECT 'Bengaluru',             35
)
SELECT
    o.metro,
    o.base_local,
    o.ccy,
    o.base_local * f.to_usd                          AS base_usd,
    ROUND( (o.base_local * f.to_usd) * 100.0
           / NULLIF(c.col_idx, 0), 2 )               AS col_adjusted_usd
FROM offers o
JOIN fx f          ON o.ccy   = f.ccy
JOIN col_index c   ON o.metro = c.metro
ORDER BY col_adjusted_usd DESC;

Step-by-step trace.

metro	base_local	ccy	base_usd	col_idx	col_adjusted_usd
New York	145,000	USD	145,000	100	145,000.00
London	95,000	GBP	121,600	85	143,058.82
Bengaluru	3,200,000	INR	38,400	35	109,714.29

The COL-adjusted USD figure normalises buying power back into a single comparable number. London ends up almost on par with NYC after the COL adjustment; Bengaluru's local-currency comp punches well above its raw-USD number once the cost basis is corrected.

Output:

metro	col_adjusted_usd
New York	145,000.00
London	143,058.82
Bengaluru	109,714.29

Why this works — concept by concept:

Join-driven normalisation — currency and COL are stored as small reference tables, joined into the main offers table. Adding a new metro = adding one row each in fx and col_index, zero query changes.
NULLIF on the COL denominator — protects against a missing COL index for a metro and avoids divide-by-zero on staging environments where the reference table is partially loaded.
VALUES-style inline tables — the UNION ALL blocks let you demonstrate the pattern without any base data; in production, the offers / fx / col_index tables live in their own schemas.
Two-step normalisation — first to a common currency (USD), then to a common COL basis. Skipping either step makes the comparison unfair.
Cost — three sequential hash joins over tiny dimension tables; the entire query is microseconds on real-world hardware.

SQL
Topic — joins
JOIN problems (SQL)

Practice →

4. Anatomy of the SQL analyst interview loop

Five stages, two-week median, one rubric — practice all five, not just SQL

The mental model in one line: a modern SQL data analyst interview loop is five stages — recruiter screen, SQL test, take-home case, virtual onsite, offer — and most candidates over-prepare stage 2 (SQL) and under-prepare stages 3-4 (case study and stakeholder roleplay). Once you treat the loop as a five-stage funnel with distinct rubrics, your prep allocation finally matches the actual reject curve.

Stage 1 — recruiter screen (~30 minutes).

What is scored. Resume signal, "why this role / company / metro," salary-expectation alignment, visa status, notice period.
What candidates underestimate. This stage is a two-way screen. The recruiter wants enough signal to pass you to the hiring manager; you want enough signal to know if the role is real (band, timeline, team scope).
The right opener. "I'm a level current-title who shipped metric impact using stack. I'm exploring target-role because clear reason."
The right salary answer. "My expectation is the upper half of the band for this level in this metro — based on market data and my experience. Can you confirm the band?"

Stage 2 — SQL HackerRank / DataLemur / take-home (60-90 min).

What is scored. Correctness on 2-4 problems, time-to-correct, choice of clean idiom over clever one-liner, behavior under time pressure.
What candidates underestimate. Reading the problem twice. Half the rejects come from rushing to type SQL before fully grasping the schema.
The right framing move. Always restate the question — "so I'm computing weekly active users, defined as distinct user_ids with at least one session_event in each ISO week, returned per week" — before opening the editor.
Common surfaces probed. Joins (inner vs left, multi-table chains), window functions (RANK, ROW_NUMBER, LAG, LEAD, SUM OVER), CTEs, NULL handling, date arithmetic, GROUP BY with HAVING, conditional aggregation.

Stage 3 — take-home analytics case (3-5 days, ~3-5 hours of work).

What is scored. Business framing, metric choice, depth of analysis, presentation craft (deck or notebook), trade-off articulation.
What candidates underestimate. This is where the take-home reject rate is highest — over-engineering the SQL and under-engineering the story of what the data shows.
The right shape. Problem statement → metric definition → SQL (in appendix) → 2-3 charts → finding → recommendation → caveat. Five pages, not twenty.
The fatal mistake. Including raw exploration without curation — the hiring manager wants to see the polished version, not the journey.

Stage 4 — virtual onsite (3-4 hours, 4-5 rounds).

Round 1 — SQL pairing (60 min). Live SQL on a shared screen with the hiring manager. Slightly harder than the screen problems, plus follow-up "what if the table is 100B rows?" probes.
Round 2 — case study (45-60 min). Open-ended business question; you whiteboard the metric, the SQL outline, and the recommendation.
Round 3 — behavioural / leadership principles (45 min). Five STAR stories ready: shipped impact, ambiguous problem, stakeholder conflict, technical disagreement, failure / learning.
Round 4 — stakeholder roleplay (30-45 min). The interviewer plays a confused PM; you must explain a metric, defuse pushback, and end with a decision.
Round 5 (some loops) — cross-functional fit / bar-raiser. PM, DS, or DE talks to you about how you'd collaborate; the bar-raiser asks the hardest behavioural question.

Stage 5 — bar-raiser / cross-functional fit + offer (3-7 days post-onsite).

What happens. Debrief among the panel; bar-raiser (or hiring committee) reviews the scorecards; recruiter calls you with an offer or a polite no.
The negotiation window. Most offers improve 5-15% with a single clean ask backed by competing-offer data or benchmark research. Always ask once, never demand twice.
Common rejection reasons. Weak window functions, no business framing, no metric trade-offs, monotone behavioural delivery, "tell me about a failure" answer with no learning.

What companies actually score.

Correctness (40%). Does your SQL return the right rows?
Communication (30%). Can you narrate your thinking aloud and explain trade-offs?
Business framing (20%). Do you ask about the metric definition, the population, the time window before writing SQL?
Partial credit on optimisation (10%). Do you notice when your query is O(n²) and either fix it or call it out?

Worked example — the canonical "second-highest salary" interview probe

Detailed explanation. This is the single most-asked SQL analyst interview question on the planet, used by Amazon, Meta, Stripe, and nearly every fintech. It tests window functions, NULL handling, and the candidate's awareness of ties.

Question. Given the employees table with (id, name, salary), write a query that returns the second-highest distinct salary. If there is no such salary (e.g. only one distinct value), return NULL.

Input.

id	name	salary
1	Alice	200,000
2	Bob	175,000
3	Cara	200,000
4	Diego	150,000
5	Eve	125,000

Code.

-- Idiom 1: DENSE_RANK + filter
WITH ranked AS (
    SELECT salary,
           DENSE_RANK() OVER (ORDER BY salary DESC) AS rk
    FROM employees
)
SELECT DISTINCT salary AS second_highest_salary
FROM ranked
WHERE rk = 2;

-- Idiom 2: subquery + MAX
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Step-by-step explanation.

DENSE_RANK() is preferred over RANK() because it does not skip numbers on ties. With two rows at 200K, DENSE_RANK gives 1, 1, 2 — so the second-highest distinct salary is rank 2.
The CTE materialises the ranking; the outer query filters and SELECTs DISTINCT to dedupe ties at rank 2 itself.
The "subquery + MAX" idiom is the older, no-window-function fallback. It still works on every dialect and is the right answer when the interviewer says "do it without window functions."
The NULL-on-no-second-salary contract works for free in both idioms — if every employee has the same salary, WHERE rk = 2 matches no rows and the empty SELECT MAX returns NULL.

Output.

second_highest_salary
175,000

Rule of thumb. Memorise DENSE_RANK for "second-highest distinct" probes. Use plain RANK when ties should skip the next rank ("medal table" semantics). ROW_NUMBER is only correct when ties are impossible.

Worked example — running totals over a partition (window-function muscle)

Detailed explanation. Running totals are the second-most-common window-function probe — every retention, revenue, or session-funnel dashboard depends on them. The mistake juniors make is grouping with SUM + GROUP BY when the question wants per-row cumulative values.

Question. Given the orders table with (customer_id, order_date, amount), return one row per order with the customer's running cumulative spend up to and including that order.

Input.

customer_id	order_date	amount
1	2026-04-01	50
1	2026-04-15	75
1	2026-05-02	30
2	2026-04-08	100
2	2026-05-12	200

Code.

SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM orders
ORDER BY customer_id, order_date;

Step-by-step explanation.

PARTITION BY customer_id resets the running total at each new customer — without it, the running total would span all customers.
ORDER BY order_date defines the running direction; the window evaluates "all rows from the start of the partition up to this one."
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the explicit frame clause. Many dialects assume it by default, but writing it explicitly makes the intent unmistakable and avoids the "range vs rows" footgun on tied order dates.
The result is one row per order with the running cumulative spend — exactly what the dashboard wants.

Output.

customer_id	order_date	amount	running_total
1	2026-04-01	50	50
1	2026-04-15	75	125
1	2026-05-02	30	155
2	2026-04-08	100	100
2	2026-05-12	200	300

Rule of thumb. Whenever a probe says "as of each row" or "cumulative through this point," reach for SUM(...) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). Memorise the full frame clause once; it stops the dialect-specific traps cold.

Worked example — funnel analysis with conditional aggregation

Detailed explanation. Funnel analysis is the canonical case-study probe in analyst interviews. The shape is "how many users hit step 1, step 2, step 3 — what is the conversion at each transition?" The SQL is conditional aggregation, often the first time juniors meet FILTER (WHERE ...) or SUM(CASE WHEN ... THEN 1 ELSE 0 END).

Question. Given the events table with (user_id, event_name, event_ts), compute a conversion funnel for signup → email_verified → first_purchase over last 30 days. Return one row with three counts and two conversion rates.

Input (toy).

user_id	event_name	event_ts
1	signup	2026-05-10
1	email_verified	2026-05-10
1	first_purchase	2026-05-12
2	signup	2026-05-11
2	email_verified	2026-05-12
3	signup	2026-05-14

Code.

WITH per_user AS (
    SELECT
        user_id,
        MAX(CASE WHEN event_name = 'signup'           THEN 1 ELSE 0 END) AS did_signup,
        MAX(CASE WHEN event_name = 'email_verified'   THEN 1 ELSE 0 END) AS did_verify,
        MAX(CASE WHEN event_name = 'first_purchase'   THEN 1 ELSE 0 END) AS did_purchase
    FROM events
    WHERE event_ts >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
)
SELECT
    SUM(did_signup)                                        AS step1_signups,
    SUM(did_verify)                                        AS step2_verified,
    SUM(did_purchase)                                      AS step3_purchased,
    ROUND(100.0 * SUM(did_verify)
                / NULLIF(SUM(did_signup),   0), 2)         AS verify_rate_pct,
    ROUND(100.0 * SUM(did_purchase)
                / NULLIF(SUM(did_verify),   0), 2)         AS purchase_rate_pct
FROM per_user;

Step-by-step explanation.

The per_user CTE collapses the event stream to one row per user with three boolean flags. MAX(CASE WHEN ...) is the dialect-portable conditional flag idiom.
The outer SELECT sums each flag to get total users at each funnel step.
The two ROUND + NULLIF expressions compute the verify-rate and purchase-rate as percentages of the previous step, with divide-by-zero protection.
The funnel can be extended to N steps by adding more boolean columns in the CTE — the shape stays identical.

Output.

step1_signups	step2_verified	step3_purchased	verify_rate_pct	purchase_rate_pct
3	2	1	66.67	50.00

Rule of thumb. Every analyst will be asked to build a funnel in their onsite. The CTE-collapses-to-one-row-per-user pattern + conditional aggregation is the most readable answer. Stay away from N-table self-joins.

Worked example — A/B test significance check

Detailed explanation. Mid-level and senior analyst interviews increasingly probe experimentation literacy. The classic question is "given a daily counts table for an A/B test, write SQL that returns whether the difference between A and B is statistically meaningful."

Question. Given ab_test_results with (test_date, variant, users, conversions), compute the conversion rate per variant and the absolute difference. Flag whether the difference is "directionally significant" (>2 percentage points) — a real significance test would require external stats, but the SQL prepares the numbers.

Input.

test_date	variant	users	conversions
2026-05-01	A	10,000	1,200
2026-05-01	B	10,200	1,470
2026-05-02	A	9,800	1,160
2026-05-02	B	10,100	1,440

Code.

WITH totals AS (
    SELECT
        variant,
        SUM(users)        AS total_users,
        SUM(conversions)  AS total_conv
    FROM ab_test_results
    GROUP BY variant
),
rates AS (
    SELECT
        variant,
        total_users,
        total_conv,
        100.0 * total_conv / NULLIF(total_users, 0) AS conv_rate_pct
    FROM totals
)
SELECT
    MAX(CASE WHEN variant = 'A' THEN conv_rate_pct END) AS a_rate,
    MAX(CASE WHEN variant = 'B' THEN conv_rate_pct END) AS b_rate,
    MAX(CASE WHEN variant = 'B' THEN conv_rate_pct END)
        - MAX(CASE WHEN variant = 'A' THEN conv_rate_pct END) AS abs_diff,
    CASE WHEN ABS(
        MAX(CASE WHEN variant = 'B' THEN conv_rate_pct END)
        - MAX(CASE WHEN variant = 'A' THEN conv_rate_pct END)
    ) >= 2 THEN 'directionally_meaningful'
           ELSE 'flat'
    END AS lift_flag
FROM rates;

Step-by-step explanation.

Aggregate across days first — totals per variant.
Compute conversion rate per variant in a follow-up CTE, with NULLIF guarding divide-by-zero.
The outer SELECT uses conditional MAX to pivot the two variants into one row with side-by-side rates.
A simple absolute-difference flag stands in for the actual two-proportion z-test the analyst would run in Python / scipy.

Output.

a_rate	b_rate	abs_diff	lift_flag
11.97	14.31	2.34	directionally_meaningful

Rule of thumb. For interviews, SQL prepares the rates and the difference; the statistical significance call still belongs in scipy / R / pre-baked dashboards. Don't pretend SQL alone tells you "the test won" — say so explicitly to the interviewer.

SQL interview question on retention curves (case-study probe)

A senior interviewer often opens the case study with: "We launched our mobile app three months ago. Define and compute a 30-day retention curve. Walk me through the SQL and the metric choice."

Solution Using a self-join + DATE_DIFF cohort pattern

WITH first_session AS (
    SELECT user_id, MIN(session_ts)::date AS install_date
    FROM sessions
    GROUP BY user_id
),
activity AS (
    SELECT
        fs.user_id,
        fs.install_date,
        DATE_DIFF('day', fs.install_date, s.session_ts::date) AS day_offset
    FROM first_session fs
    JOIN sessions s ON s.user_id = fs.user_id
    WHERE s.session_ts::date BETWEEN fs.install_date
                                 AND fs.install_date + INTERVAL '30 days'
)
SELECT
    install_date,
    COUNT(DISTINCT user_id) FILTER (WHERE day_offset = 0)  AS cohort_size,
    ROUND(100.0 *
        COUNT(DISTINCT user_id) FILTER (WHERE day_offset = 1)
        / NULLIF(COUNT(DISTINCT user_id) FILTER (WHERE day_offset = 0), 0),
        2) AS d1_retention_pct,
    ROUND(100.0 *
        COUNT(DISTINCT user_id) FILTER (WHERE day_offset = 7)
        / NULLIF(COUNT(DISTINCT user_id) FILTER (WHERE day_offset = 0), 0),
        2) AS d7_retention_pct,
    ROUND(100.0 *
        COUNT(DISTINCT user_id) FILTER (WHERE day_offset = 30)
        / NULLIF(COUNT(DISTINCT user_id) FILTER (WHERE day_offset = 0), 0),
        2) AS d30_retention_pct
FROM activity
GROUP BY install_date
ORDER BY install_date;

Step-by-step trace.

install_date	cohort_size	d1_retention_pct	d7_retention_pct	d30_retention_pct
2026-04-01	1,200	64.50	38.20	22.10
2026-04-08	1,340	63.10	41.10	24.80
2026-04-15	1,500	67.20	42.80	26.40

Each row is a weekly install cohort; the columns track what fraction of the cohort was active on day 1, day 7, and day 30 after install. The trace shows a gradually improving retention curve — exactly the diagnostic an analyst delivers to the growth team.

Output:

install_date	d1	d7	d30
2026-04-01	64.5%	38.2%	22.1%
2026-04-08	63.1%	41.1%	24.8%
2026-04-15	67.2%	42.8%	26.4%

Why this works — concept by concept:

Cohort pinning via first_session — each user is anchored to the day they installed, not the day they were active. This is what makes "30-day retention" mean the same thing for every cohort.
DATE_DIFF day_offset — converts an absolute session timestamp into "days since install." That single transformation makes the rest of the SQL trivial.
FILTER (WHERE day_offset = N) — conditional counts inside the same GROUP BY. Lets you produce d1 / d7 / d30 retention in a single scan, no self-joins.
NULLIF on cohort_size — defensive divide-by-zero protection for any cohort that started with zero installs (e.g. a partial-data day at the very end of the window).
One row per cohort — the dashboard reads the table directly; no Python post-processing required.
Cost — O(sessions) for the JOIN, O(cohort × day_offset_unique) for the aggregate. For most apps this fits comfortably in a single warehouse query.

SQL
Topic — cohort analysis
Cohort analysis problems (SQL)

Practice →

5. 2026 application roadmap — CV, portfolio, prep, outreach

Twelve weeks, four phases, one tracking system — the roadmap most SQL data analyst hires actually follow

The mental model in one line: a 12-week application roadmap converts an open-ended "I'm looking for analyst jobs" into a calendar with weekly outputs — and the candidates who keep that calendar honest are the ones who get offers in quarter two instead of quarter four. Pick a runway, copy the phases below into Notion, and start.

Phase 1 — weeks 1-2: CV optimisation, LinkedIn rewrite, GitHub / portfolio refresh.

CV impact bullets. Every bullet in the "Experience" section follows the drove metric by pct via mechanism (stack) pattern. Cut everything that doesn't fit.
Top three-line summary. "Data analyst with N years of stack experience. Shipped headline metric impact. Currently exploring target-role in target-metro."
LinkedIn keyword density. "SQL", "Python", "dbt", "Looker / Tableau / Mode", plus the verticals you've worked in. Recruiter search-engine optimisation is real.
GitHub readme template. Pin two end-to-end projects; each readme follows problem → data → SQL → finding → caveat → links.

Phase 2 — weeks 3-4: portfolio projects.

Two end-to-end case studies. Each one is a public-data project: download a dataset, frame a metric, write the SQL, build two-three charts, write a 500-word narrative.
Recommended datasets. NYC TLC trip data, Stack Overflow public BigQuery dataset, the Spotify "tracks" public dataset, the COVID public dashboard archives, the IMDB top-1000 movies dataset.
Recommended stack. dbt (free) + DuckDB or BigQuery free tier + Streamlit or Hex (free tier). Avoid paid Tableau-style tools — the portfolio should not assume the reviewer has a license.
One narrative per case study. Problem statement → metric definition → SQL excerpt → 2-3 charts → finding → caveat. Five hundred words, three images, one chart-of-the-week-worthy headline.

Phase 3 — weeks 5-10: SQL drill plan + case-study prep.

60-80 SQL problems across joins, windows, CTEs, NULL handling, conditional aggregation, date arithmetic. Aim for 8-12 per week.
Difficulty mix. 30% easy (warm-ups), 50% medium (interview-grade), 20% hard (FAANG-grade). Don't only drill hard — speed and reliability on mediums is what wins screens.
Case-study prep. Read 5 public case studies (Stripe data blog, Spotify engineering blog, Netflix tech blog, Airbnb data team blog, Wise eng blog). Internalise the shape — problem framing, metric choice, trade-offs.
A/B test framing. Practice five canonical questions: sample size, primary metric, guardrail metrics, ramp plan, decision rubric.

Phase 4 — weeks 11-12: targeted outreach.

Recruiter DMs. Pick 20 target companies. Find the analytics recruiter on LinkedIn. Send a 3-sentence DM: opener (mutual interest), specific signal (you've shipped X), ask (15-min chat about open roles).
Alumni warm intros. Search LinkedIn for "alumni of your university at target company." Coffee chat → referral → application.
Hiring-manager DMs. Identify the head of analytics at 5 dream companies. Send a slightly longer note: opener, specific signal, 2-sentence project pitch, ask (15-min call).
Tracking pipeline. Notion or a Google Sheet with columns: Company, Role, Channel (cold / referral / recruiter), Applied date, Stage, Notes, Next step.

Tracking pipeline in Notion / Sheets.

Columns at minimum. Company, posting URL, role title, salary band (if known), channel, applied date, current stage, next step, notes.
Status states. Applied → Screen → SQL test → Take-home → Onsite → Offer → Closed (accepted / declined / rejected).
Cadence. Update at end of every day; review weekly; archive closed rows monthly.
Why the tracking matters. Pipeline visibility = offer rate uplift. Candidates without a tracking system underestimate their conversion rate by ~2x and over-stress about individual rejections.

When to accept — the four-axis decision matrix.

Salary + equity. Is the comp competitive with my benchmark? Is the equity vest schedule reasonable? Is there a sign-on?
Growth + scope. Will I learn new things? Is there a path to senior / staff inside two years?
Team + manager. Did the hiring manager give me energy? Did the team feel high-trust?
Mission + product. Do I care about the problem? Will the product be around in 4 years?

Score each offer 1-5 on each axis. Below 3 on any axis is a yellow flag; below 2 is a red flag. Above 4 on three axes is a clear accept.

Worked example — application-pipeline tracking SQL

Detailed explanation. Treat your own job search as an analyst project. Track every application; compute stage-by-stage conversion rates; adjust strategy based on the numbers. The same SQL skill you sell to employers should be applied to the search itself.

Question. Given an application_pipeline table with (company, applied_at, current_stage, channel), compute stage conversion rates and channel breakdown. Identify the bottleneck.

Input.

company	applied_at	current_stage	channel
A	2026-04-01	onsite	referral
B	2026-04-02	rejected_screen	cold
C	2026-04-05	offer	recruiter
D	2026-04-07	take_home	referral
E	2026-04-08	sql_screen	cold

Code.

WITH stage_counts AS (
    SELECT
        channel,
        COUNT(*) FILTER (WHERE current_stage IN
            ('applied','sql_screen','take_home','onsite','offer'))     AS applied,
        COUNT(*) FILTER (WHERE current_stage IN
            ('sql_screen','take_home','onsite','offer'))               AS reached_sql,
        COUNT(*) FILTER (WHERE current_stage IN
            ('take_home','onsite','offer'))                            AS reached_takehome,
        COUNT(*) FILTER (WHERE current_stage IN
            ('onsite','offer'))                                        AS reached_onsite,
        COUNT(*) FILTER (WHERE current_stage = 'offer')                AS reached_offer
    FROM application_pipeline
    GROUP BY channel
)
SELECT
    channel,
    applied,
    ROUND(100.0 * reached_sql       / NULLIF(applied, 0), 1)  AS pct_to_sql,
    ROUND(100.0 * reached_takehome  / NULLIF(applied, 0), 1)  AS pct_to_takehome,
    ROUND(100.0 * reached_onsite    / NULLIF(applied, 0), 1)  AS pct_to_onsite,
    ROUND(100.0 * reached_offer     / NULLIF(applied, 0), 1)  AS pct_to_offer
FROM stage_counts
ORDER BY channel;

Step-by-step explanation.

The FILTER (WHERE current_stage IN ...) syntax counts only rows whose stage has reached or passed a given milestone. This is the conditional aggregation pattern from the funnel example, applied to your own search.
The outer query computes per-stage conversion as a percentage of applied. NULLIF guards against channels with zero applications.
The output gives you a per-channel funnel: "10 cold applications, 5% to offer; 6 referral applications, 33% to offer." The bottleneck is now visible by channel.
If your overall pct_to_sql is below 20%, the bottleneck is resume / CV (cold-screen rejects). If your pct_to_offer from onsite is below 20%, the bottleneck is interview craft.

Output (toy data).

channel	applied	pct_to_sql	pct_to_takehome	pct_to_onsite	pct_to_offer
cold	10	30.0	10.0	10.0	0.0
recruiter	6	83.3	50.0	33.3	16.7
referral	5	80.0	60.0	40.0	20.0

Rule of thumb. Track your application pipeline like a data product. The number to optimise is applications × conversion-to-offer. Channel diversification (referrals, recruiters, cold) compounds.

Worked example — drill-plan progress tracking

Detailed explanation. Sixty SQL problems sounds like a lot in week 5 and an inconvenient amount in week 10. Tracking weekly progress prevents the mid-program drift that costs most candidates a month.

Question. Given a drill_log table with (problem_id, topic, difficulty, solved_at), compute weekly cumulative problems solved by topic and flag weeks that fell below the 8-per-week pace.

Input (toy).

problem_id	topic	difficulty	solved_at
101	joins	medium	2026-05-04
102	window	medium	2026-05-05
103	nulls	easy	2026-05-06
104	joins	hard	2026-05-10
105	aggregation	medium	2026-05-12

Code.

WITH weekly AS (
    SELECT
        DATE_TRUNC('week', solved_at) AS week_start,
        topic,
        COUNT(*) AS solved_this_week
    FROM drill_log
    GROUP BY 1, 2
),
running AS (
    SELECT
        week_start,
        topic,
        solved_this_week,
        SUM(solved_this_week) OVER (
            PARTITION BY topic
            ORDER BY week_start
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative
    FROM weekly
)
SELECT
    week_start,
    topic,
    solved_this_week,
    cumulative,
    CASE WHEN solved_this_week < 2 THEN 'under_pace'
         ELSE                           'on_pace'
    END AS pace_flag
FROM running
ORDER BY week_start, topic;

Step-by-step explanation.

The weekly CTE groups problems by ISO week and topic.
The running CTE adds a cumulative-within-topic running total via window function.
The outer query labels each (week, topic) cell as on-pace or under-pace based on a per-week-per-topic target of 2.
The dashboard built on top shows which topics are behind, so the next week's effort is allocated where it counts.

Output.

week_start	topic	solved_this_week	cumulative	pace_flag
2026-05-04	joins	2	2	on_pace
2026-05-04	window	1	1	under_pace
2026-05-04	nulls	1	1	under_pace
2026-05-11	joins	1	3	under_pace
2026-05-11	aggregation	1	1	under_pace

Rule of thumb. Treat your prep plan as a self-managed sprint. Weekly pace metrics catch drift early; the alternative is realising in week 9 that you've drilled joins to death and never touched window functions.

Worked example — outreach response-rate analysis

Detailed explanation. Cold-DM response rates are notoriously low. A clean A/B test of two message templates can lift your response rate from 5% to 15% — which means three times the interview pipeline for the same effort.

Question. You sent 200 LinkedIn DMs split between two templates. Compute the response rate per template and decide which one wins.

Input.

template	sends	responses
short_specific	100	14
long_generic	100	4

Code.

SELECT
    template,
    sends,
    responses,
    ROUND(100.0 * responses / NULLIF(sends, 0), 1) AS response_rate_pct,
    CASE
        WHEN responses * 1.0 / NULLIF(sends, 0)
             > 2 * MIN(responses * 1.0 / NULLIF(sends, 0)) OVER ()
        THEN 'clear_winner'
        WHEN responses * 1.0 / NULLIF(sends, 0)
             > 1.2 * MIN(responses * 1.0 / NULLIF(sends, 0)) OVER ()
        THEN 'better'
        ELSE 'no_clear_difference'
    END AS verdict
FROM outreach_test;

Step-by-step explanation.

Compute response rate per template via simple division with NULLIF on the denominator.
The CASE WHEN compares each template's rate to the minimum across templates (used as a baseline). A rate >2x the baseline is a clear winner; >1.2x is "better but worth more data."
The window-function trick — MIN(...) OVER () — lets the comparison happen in one query, no self-join required.
Real significance would require sample size and a proper proportion test; this SQL gives a directional answer for inside the prep loop.

Output.

template	sends	responses	response_rate_pct	verdict
long_generic	100	4	4.0	no_clear_difference
short_specific	100	14	14.0	clear_winner

Rule of thumb. Treat outreach templates like product copy: A/B test them, measure response rate, and double down on the winner. Three lines of crisp, specific, mutual-interest text beats five paragraphs of generic enthusiasm.

SQL interview question on offer-decision scoring

A senior interviewer often closes the case-study round with: "You have three offers in hand. Walk me through how you'd pick. Bonus points for a query that ranks them."

Solution Using a weighted scoring rubric

WITH offers AS (
    SELECT 'FAANG NYC'    AS company, 5 AS salary_score, 4 AS growth_score,
                                       3 AS team_score,   3 AS mission_score UNION ALL
    SELECT 'Climate startup', 3,       5,                 5,                 5 UNION ALL
    SELECT 'Hybrid London',   4,       4,                 4,                 3
),
weights AS (
    SELECT 0.30 AS w_salary, 0.30 AS w_growth, 0.25 AS w_team, 0.15 AS w_mission
)
SELECT
    o.company,
    o.salary_score, o.growth_score, o.team_score, o.mission_score,
    ROUND(
          w.w_salary  * o.salary_score
        + w.w_growth  * o.growth_score
        + w.w_team    * o.team_score
        + w.w_mission * o.mission_score
    , 2) AS weighted_score
FROM offers o
CROSS JOIN weights w
ORDER BY weighted_score DESC;

Step-by-step trace.

company	salary	growth	team	mission	weighted_score
FAANG NYC	5	4	3	3	4.20
Climate startup	3	5	5	5	4.40
Hybrid London	4	4	4	3	3.85

The weighted-score column collapses four axes into a single ranking number under explicit, defensible weights. Tweaking the weights (e.g. mission-first vs salary-first) gives a sensitivity test — does the same offer still win if mission is bumped to 30%? That tells you how robust your decision is.

Output:

company	weighted_score
Climate startup	4.40
FAANG NYC	4.20
Hybrid London	3.85

Why this works — concept by concept:

CROSS JOIN with a single-row weights table — broadcasts the weights to every offer row without manual repetition. Changing the weights = updating one row.
Explicit weight declaration — the weights are visible in the SQL, in version control, in the writeup. The decision is no longer a vibe; it is a numerical defence.
Sensitivity check via weight tweak — re-run with different weights to see whether the winner flips. If the winner is robust under reasonable weight perturbations, you can commit.
One row per option — the ranking output is a one-glance summary you can share with a mentor or partner.
Cost — O(offers × 1) — the weights table has a single row. Trivial.

SQL
Topic — case expression
CASE expression problems (SQL)

Practice →

Cheat sheet — SQL analyst job-search recipes

One-line resume hook. "Analyst who ships dashboards stakeholders actually open." Replaceable variants — "metric I owned moved by X%," "shipped N dbt models in production," "ran the team A/B test that lifted conversion Y%." Pick one with a number.
Three-bullet impact format. "Drove metric direction by pct via mechanism (stack)." Lead the experience section with three of these. Cut everything that does not fit the pattern.
Portfolio readme template. Problem → data → SQL excerpt → 2-3 charts → finding → caveat. Five-section readme, two end-to-end projects, one polished chart-of-the-week-worthy headline.
SQL warm-up before every interview. Five medium DataLemur / PipeCode problems + one window-function deep dive (LAG, ROW_NUMBER, SUM OVER) the morning of. Slow muscles cost screens.
Salary negotiation script. "Based on market data and my experience, I'm targeting the upper half of the band for level metro. The 75th-percentile public benchmark is $X. My expectation lands between the median and that." Specificity is leverage.
LinkedIn search filter. "data analyst" + "SQL" + posted past 7 days + < 200 applicants. Save the search. Re-run weekly. Avoid postings older than 30 days.
Cold DM template. Line 1: opener referencing a specific signal (their dbt blog post, their analytics-engineer hire, their A/B test public talk). Line 2: your one-line hook + a relevant metric. Line 3: ask (15-min chat about open roles). 80 words, no longer.
Recruiter screen open. "I'm a level current-title who shipped metric impact using stack. I'm exploring target-role because reason." Inside 90 seconds. Then shut up and let them lead.
Take-home five-page rule. Problem → metric → SQL appendix → 2-3 charts → finding → recommendation → caveat. Five pages, three images, one decision. Cut everything else.
Onsite STAR-story prep. Five stories: shipped impact, ambiguous problem, stakeholder conflict, technical disagreement, failure / learning. Each story: 2 sentences of situation, 2 of task, 2 of action, 1 of result, 1 of learning. 90 seconds each.
Practice tracking pipeline. Notion or a Google Sheet with columns: Company, Role, Channel, Applied, Stage, Next step. Update daily; review weekly. Pipeline visibility = offer rate uplift.
Offer-decision rubric. Score each offer 1-5 on salary, growth, team, mission. Weight by your priorities. Below 3 on any axis is a yellow flag; below 2 is red. Above 4 on three axes is a clear accept.
Window-function muscle memory. Memorise five window patterns: ROW_NUMBER OVER (PARTITION BY ... ORDER BY ...), SUM OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), LAG / LEAD OVER (PARTITION BY ... ORDER BY ...), RANK / DENSE_RANK OVER (ORDER BY ... DESC), PERCENT_RANK OVER (...). Five patterns cover ~80% of probes.

Frequently asked questions

Are SQL data analyst jobs still in demand in 2026?

Yes — strongly. The US, UK, EU, and India markets are all net-positive on data-analyst postings YoY into 2026, with fintech, healthcare, climate, and AI-tooling sectors driving the growth. LLMs raised the bar (more skills expected per posting) but did not collapse demand — the analyst role's core value is in framing the metric and negotiating with stakeholders, which AI cannot do. Expect SQL + Python + one BI tool + dbt familiarity as the modern baseline. The market is healthiest for mid-level analysts with three to five years of experience; junior bands are competitive but plentiful.

What is the average salary for a SQL data analyst in 2026?

In the US, the mid-level base band is $100-145K in NYC / SF Bay and $90-125K in Austin / Chicago / Atlanta. Senior bands run $145-200K base in NYC / SF Bay, with FAANG total comp reaching $250-350K when equity is included. London mid-level bands run £60-90K base; Berlin €65-95K; Amsterdam €70-100K. Bengaluru mid-level bands run INR 18-32 LPA at product / FAANG India, lower at standard GCCs. Specialty stacks like dbt + Snowflake + Looker command a 10-15% premium over generic SQL roles. Always benchmark your specific level / metro / employer-tier on Levels.fyi or internal comp tools — public averages are coarse-grained.

Which companies hire the most SQL data analysts?

The largest hirers in 2026 split between FAANG (Meta, Google, Amazon, Microsoft, Apple, Netflix) and fintech (Stripe, Plaid, Wise, Revolut, JPMC, Bloomberg, Goldman Sachs). Scale-ups with mature data teams — Airbnb, Pinterest, Spotify, Notion, Linear, Figma, Datadog, Snowflake, Databricks — are reliable employers with strong analyst cultures. Remote-first hirers include GitLab, Automattic, Zapier, Dbt Labs, Hashicorp, Posthog. India GCC hirers include Walmart, Target, Wells Fargo, Cisco, Goldman Sachs, Lowe's, and the product giants Flipkart, Razorpay, CRED, Swiggy, Zomato. Pick three target tiers (FAANG, scale-up, remote-first) and apply across all three for portfolio diversification.

How long does the average analyst interview loop take?

The median loop is two weeks end to end — recruiter screen (week 0), SQL screen (week 1), take-home (week 1-2), virtual onsite (week 2), offer (week 2-3). FAANG loops add a bar-raiser round and stretch to three to four weeks. Startups often skip the take-home and compress the loop to one week. The total time investment for the candidate is roughly 8-12 hours of interview time across the full loop, plus 3-5 hours of take-home work and 3-5 hours of prep per loop. Block 15-20 hours per active loop in your calendar and run no more than three concurrent loops to keep quality high.

Do I need Python to land a SQL data analyst job?

For pure data-analyst roles, Python is nice-to-have but not blocking — SQL + one BI tool + stakeholder craft is enough at most companies. For analytics-engineer roles, Python (or a Python-adjacent tool like dbt + Jinja) is required. For senior data-analyst roles at scale-ups, Python for ad-hoc analysis (pandas, plotly) and basic ML literacy (scikit-learn, statsmodels) raise your salary band by 5-10% and unlock the case-study round at experimentation-heavy teams. The fastest path: ship one portfolio project that uses Python for the data prep and SQL for the metric layer — that signals both axes without forcing a deep dive.

What's the best way to break into analytics with no experience?

Build two end-to-end portfolio projects on public datasets (Stack Overflow public BigQuery, NYC TLC, Spotify tracks). Each project should be a problem → SQL → 2-3 charts → finding → readme artifact — not a tutorial walkthrough. Pair each project with a single LinkedIn post explaining the finding in plain English. Then apply to "Junior Data Analyst" and "Data Analyst I" roles at companies that hire from non-traditional backgrounds (Stripe, Wise, GitLab, Dbt Labs, mid-size health-tech). Layer in referrals via alumni networks; cold applications without referrals convert at ~2%, referrals at ~20%. Expect 3-6 months from "no experience" to "first offer" if the portfolio and outreach cadence are honest. PipeCode's SQL for data engineering interviews course → shortcuts the SQL-fundamentals phase substantially.

Practice on PipeCode

Drill the aggregation practice library → for the COUNT / SUM / AVG family that analyst screens hammer on.
Rehearse on window functions problems → for the ROW_NUMBER / LAG / SUM-OVER probes that decide most onsite rounds.
Sharpen joins drills → for the INNER / LEFT / FULL-OUTER patterns at the heart of every case study.
Layer in CASE expression problems → for conditional aggregation and funnel construction.
Stack null-handling drills → — every interviewer probes it, most candidates underprep it.
Practice cohort analysis problems → for retention curves and the canonical 30-day-retention case study.
Tighten CTE drills → — the readability lever that earns partial credit on every onsite.
Target Meta SQL drills →, Amazon SQL drills →, Google SQL drills →, and Stripe SQL drills → for company-specific patterns.
For the broader surface, read top data engineering interview questions →.
Stack the prerequisites with the only 5 skills you need to become a data engineer →.
Sharpen the SQL axis end-to-end via the SQL for data engineering interviews course →.
For long-form metric craft, work through product sense and metrics for data engineering interviews →.
Round out the behavioural axis with behavior interview prep for data engineering interviews →.

Pipecode.ai is Leetcode for Data Engineering — every SQL pattern, salary benchmark, and interview probe in this guide ships with hands-on practice rooms where you write the window function, the funnel CTE, and the cohort retention query against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so your `sql data analyst jobs` prep loop closes the gap between "I read about it" and "I just shipped it in 18 minutes flat under timer."

Practice SQL aggregation now →
Window function drills →

1. State of the SQL data analyst market in 2026

The analyst role survived the LLM wave by absorbing analytics engineering and stakeholder storytelling

Worked example — sizing your "metric ownership" claim

Worked example — auditing an AI-generated SQL query

Worked example — picking the right title to apply for

SQL interview question on market signal detection

Solution Using a job-post telemetry funnel

2. Top hiring markets — US, EU, India, remote

Five metros plus one global remote segment hold 80% of the SQL data analyst job posts in 2026

Worked example — narrowing your target metro list

Worked example — remote vs hybrid trade-off ranking

Worked example — sector growth detection

SQL interview question on top-N market ranking

Solution Using ROW_NUMBER() window with PARTITION BY

3. Salary bands by region and seniority

Base salary tells half the story — total compensation, equity vest, and cost-of-living-adjusted net pay tell the rest

Worked example — translating a remote offer into your local-comp frame

Worked example — total-compensation projection over a 4-year vest

Worked example — negotiation leverage from a benchmark query

SQL interview question on salary-band normalisation

Solution Using a currency-and-COL adjustment join

4. Anatomy of the SQL analyst interview loop

Five stages, two-week median, one rubric — practice all five, not just SQL

Worked example — the canonical "second-highest salary" interview probe

Worked example — running totals over a partition (window-function muscle)

Worked example — funnel analysis with conditional aggregation

Worked example — A/B test significance check

SQL interview question on retention curves (case-study probe)

Solution Using a self-join + DATE_DIFF cohort pattern

5. 2026 application roadmap — CV, portfolio, prep, outreach

Twelve weeks, four phases, one tracking system — the roadmap most SQL data analyst hires actually follow

Worked example — application-pipeline tracking SQL

Worked example — drill-plan progress tracking

Worked example — outreach response-rate analysis

SQL interview question on offer-decision scoring

Solution Using a weighted scoring rubric

Cheat sheet — SQL analyst job-search recipes

Frequently asked questions

Are SQL data analyst jobs still in demand in 2026?

What is the average salary for a SQL data analyst in 2026?

Which companies hire the most SQL data analysts?

How long does the average analyst interview loop take?

Do I need Python to land a SQL data analyst job?

What's the best way to break into analytics with no experience?

Practice on PipeCode

Solution Using `ROW_NUMBER()` window with `PARTITION BY`