I spent months collecting and organizing real data engineering interview questions from 97+ companies including Amazon, Google, Databricks, Goldman Sachs, Walmart, and Meta.
The result: **1,863 questions** across 7 categories, each with a Senior/Principal-level answer.
Here's what I learned about what top companies actually ask.
## The 7 Categories (and their weight in real interviews)
| Category | Questions | Interview Weight |
| ---------------- | --------- | ------------------------- |
| SQL | 487 | Every single interview |
| Spark / Big Data | 452 | Critical for senior roles |
| System Design | 179 | The make-or-break round |
| Python / Coding | 179 | Usually 1–2 rounds |
| Cloud / Tools | 179 | AWS, GCP, Airflow, dbt |
| Behavioral | 144 | Often underestimated |
| Fundamentals | 243 | Phone screen staples |
## The Surprising Patterns
### 1. SQL is 90% of phone screens
Almost every company starts with SQL. But it's not just `SELECT * FROM`. The questions I collected most frequently:
- **Window functions** (ROW_NUMBER, RANK, LAG/LEAD) — asked at 70%+ of companies
- **Self-joins and anti-joins** — Amazon's favorite
- **Query optimization** — "This query takes 45 minutes. Fix it."
- **Recursive CTEs** — Goldman Sachs asks these regularly
### 2. System Design separates Senior from Staff
The gap between a mid-level and senior candidate isn't SQL knowledge — it's **system design thinking**. The top questions I found:
- "Design a real-time analytics pipeline for e-commerce"
- "How would you handle late-arriving data in a streaming pipeline?"
- "Design a data warehouse for a ride-sharing company"
What makes a great answer isn't the architecture — it's explaining **trade-offs**:
- Why Kafka over RabbitMQ for *this specific use case*?
- What's the CAP theorem trade-off you're making?
- What happens when this component fails? (Blast Radius)
### 3. Behavioral rounds are pass/fail gates
I was surprised how many senior candidates get rejected in behavioral rounds. The pattern:
- **Amazon**: 100% LP-focused. Every answer needs a Leadership Principle.
- **Google**: "Tell me about a time you disagreed with a technical decision"
- **Meta**: Focus on impact metrics ("What was the business result?")
The STAR method (Situation, Task, Action, Result) works for all of them. But your Result needs **numbers**.
### 4. Company-specific patterns are real
After mapping questions to companies, clear patterns emerged:
- **Amazon**: Heavy on SQL optimization + Leadership Principles
- **Google**: System Design + coding fundamentals
- **Databricks**: Spark internals (shuffle, partitioning, catalyst optimizer)
- **Goldman Sachs**: SQL edge cases + data quality/governance
- **Snowflake**: Their own architecture + query optimization
## What I Built
I turned this into [DataEngPrep.tech](https://dataengprep.tech) — a free platform where you can browse all 1,863 questions with partial answer previews.
Every question page shows:
- The question text
- Which companies ask it
- Difficulty level and category
- A preview of the expert answer (first ~500 chars)
- Full answer behind a paywall
The full answers go deep — trade-offs, architecture diagrams for System Design, and a "Pro-Tip" on every question (either a common mistake to avoid or a technique that impresses interviewers).
## 5 Questions You Should Practice Right Now
If you have a data engineering interview coming up, practice these — they appear everywhere:
1. **"Explain the difference between a star schema and snowflake schema. When would you use each?"** — Tests data modeling fundamentals
2. **"How would you optimize a slow-running Spark job?"** — Tests production experience (hint: start with shuffle reduction, then partitioning)
3. **"Design a data pipeline that handles late-arriving events"** — Tests system design + real-world awareness
4. **"Write a SQL query to find the second-highest salary in each department"** — Tests window functions (the #1 most-asked SQL pattern)
5. **"Tell me about a time you had to make a technical decision with incomplete information"** — Tests decision-making under uncertainty
---
If you're prepping for a DE interview, check out [DataEngPrep.tech](https://dataengprep.tech). All 1,863 question pages are free to browse.
What's the hardest interview question you've been asked? Drop it in the comments — I'll add it to the collection. 👇
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)