Most people prepare for a data analyst interview like it's a SQL exam. It isn't. The loop tests three completely different skills in sequence: technical SQL, open-ended business case thinking, and behavioral storytelling. Over-prepare on one and you'll bomb the others.
Here's the breakdown that actually maps to what interviewers are scoring you on in 2026 — with real question patterns, strong answer structures, and the failure modes that knock candidates out.
What Every DA Interview Round Actually Tests
| Round | What They're Scoring | Most Common Failure |
|---|---|---|
| SQL Technical | Accurate, optimized queries | Forgetting NULLs, wrong JOIN type |
| Business Case | Metrics thinking, not just data pulling | Jumping to analysis before defining the metric |
| Behavioral | Did you drive decisions, or just build reports? | "I built a dashboard" with no stated impact |
| Stats / Probability | Do you understand your own outputs? | Confusing correlation with causation |
The biggest pattern I've seen: candidates who can write perfect SQL answers but who treat business case questions as SQL exercises. They don't work the same way.
SQL: The 3 Patterns You'll Hit in Every Loop
1. Monthly retention rate
This one comes up constantly. The correct approach uses CTEs and a self-join on the user activity table with a one-month offset:
WITH monthly_active AS (
SELECT user_id, DATE_TRUNC('month', event_date) AS activity_month
FROM events
GROUP BY 1, 2
),
retention AS (
SELECT
m1.activity_month AS cohort_month,
COUNT(DISTINCT m1.user_id) AS users_in_month,
COUNT(DISTINCT m2.user_id) AS users_retained_next_month
FROM monthly_active m1
LEFT JOIN monthly_active m2
ON m1.user_id = m2.user_id
AND m2.activity_month = m1.activity_month + INTERVAL '1 month'
GROUP BY 1
)
SELECT cohort_month, users_in_month, users_retained_next_month,
ROUND(100.0 * users_retained_next_month / users_in_month, 2) AS retention_rate_pct
FROM retention ORDER BY cohort_month;
Most candidates write a subquery version that works but doesn't scale. The CTE form signals you think about readability and maintainability.
2. JOIN type selection
INNER vs LEFT JOIN is a setup for the real question: "Which is faster?" Don't answer "LEFT JOIN is slower" — that signals a misunderstanding. Speed depends on data distribution and indexes, not the join type.
The correct answer: use LEFT JOIN when you need to preserve the full universe of your base table regardless of whether a match exists on the other side. Using INNER JOIN when you want "all users who haven't purchased yet" silently drops them from the output.
3. Running totals with window functions
Know the difference between SUM(revenue) OVER (ORDER BY order_date) (global running sum) vs adding PARTITION BY user_id (per-user running sum). This is the most common window function gap.
Business Case: Structure Separates Candidates
The business case round doesn't have a correct answer — it tests whether you think in hypotheses before reaching for SQL.
Classic question: "Our website traffic dropped 40% last Tuesday. Walk me through your analysis."
Weak answer: "I'd query the events table by date and look for the drop."
Strong answer structure:
- Confirm the signal — is it all sources or one channel? All pages or one section?
- Segment — break by source/medium, device, geo, landing page
- Correlate with events — deploys, campaigns, algorithm changes, competitor actions
- Form 2–3 hypotheses, then pull SQL to validate each
- Communicate the most likely cause with a confidence level
The distinction: you're forming a hypothesis first, then using SQL to test it — not the reverse.
Engagement metric design
When asked to define a user engagement metric, "daily active users" and "time on site" are weak answers. A better answer ties the metric to a business outcome: for a SaaS product, "core action completion rate" (percentage of sessions where the user completed the product's primary promise) predicts retention significantly better than time on screen.
Behavioral: Business Impact or It Doesn't Count
The formula that works:
What decision was being made → what data revealed → what you recommended → what happened
Weak: "I analyzed churn data and built a dashboard."
Strong: "We were about to launch a generic win-back campaign to all churned users from the past year. I found that 40% had churned due to billing failures, not product dissatisfaction. I segmented these users and recommended a separate reactivation flow. The targeted campaign had 3.2× the reactivation rate."
The key isn't the analysis — it's what decision changed because of it.
Stats: Two Questions That Separate Candidates
P-values: A p-value of 0.04 does not mean there's a 96% chance your hypothesis is correct. It means: if nothing were going on, this result would occur by chance 4% of the time. Many candidates — and many practitioners — get this backwards.
Statistical vs. practical significance: A 0.1% conversion lift can be statistically significant at p < 0.05 while being practically useless. Always pair significance with effect size. In business settings, the question isn't "is this real?" but "is this worth building?"
Preparing for the Actual Interview Format
Reading answer structures isn't the same as being ready to say them out loud under time pressure. SQL rounds are typically 30–45 minutes for 2–3 problems. Business case rounds run 20–30 minutes. Both require a pacing component that practice with notes doesn't build.
Been using ManyOffer to sharpen my own answers — if you want AI mock interviews with real LP feedback, they have a deal running through July worth checking out: Claim 1 free month here
Top comments (0)