Top 20 Amazon Data Manipulation SQL Interview Questions (2026)

#interview #amazon #datamanipulationsql #questions

Amazon's Data Manipulation SQL interviews usually focus on real analytics work: event logs, customer behavior, joins across messy tables, and ranking or aggregation logic with edge cases. You should expect questions that check whether you can move from a business prompt to clean SQL quickly, while keeping duplicates, nulls, and time logic under control.

User lifecycle and event stream questions

Retrieve First Active and Last Inactive Dates per User

This is a classic state-transition question. Interviewers want to see whether you can isolate conditional dates with grouped aggregations or window functions, and whether you notice users who may never hit one of the statuses.
Explore Subscription Patterns and Status Transitions with SQL/Pandas

Amazon likes problems where raw status rows need to be turned into a user journey. The key is to think in ordered events, compare each row to the prior or next status, and define transitions clearly before writing code.
Process real-time enter/exit events and actives

This checks whether you understand event streams instead of static snapshots. A good approach usually starts with ordering events by timestamp, then deriving active counts or intervals while handling out-of-order or repeated records carefully.
Write SQL window functions for D7 retention

Retention questions are common because they mix business logic with date arithmetic. Interviewers are looking for correct cohort definitions, clean joins from signup date to return activity, and comfort with window functions or self-joins.
Verify subscriptions and analyze orders with SQL/Python

This kind of question checks whether you can connect subscription validity to downstream behavior like ordering. The hard part is usually joining time-bounded records to fact tables without double counting.

Customer spend, revenue, and ranking

Identify Top Spenders and Segment Customers Using Python

Even though this one uses Python, the core thinking is analytical data manipulation. It tests whether you can aggregate customer spend, define segmentation rules, and communicate assumptions around ties or cutoff thresholds.
Consolidate and Rank Global Salaries in USD

This is a good example of "business logic before SQL syntax." You need to normalize values across currencies, join exchange rates correctly, then rank results in a way that matches the prompt.
Calculate Regional Revenue and Identify Top Customers

Amazon often asks revenue questions because they show whether you can join dimensional data to transactions cleanly. Expect to group at one grain, then rank within region or market without losing detail.
Select Top Customers Using Transaction Data Filters

This tests your discipline with filtering before aggregation. A lot of candidates get tripped up by applying conditions at the wrong stage, so think carefully about row-level filters versus post-aggregation filters.
Find top-spend categories per customer with ranking

Per-entity ranking is one of the most common interview patterns. The interviewer wants to see partitioned ranking functions, tie handling, and whether you aggregate to the right customer-category grain first.
Design SQL/Pandas aggregations on retail schema

Retail schema questions are broad on purpose. They test whether you can inspect the schema, decide the correct joins and grouping keys, and avoid common mistakes like duplicated revenue from one-to-many joins.

Recommendation logic and pair generation

Find recommended friend pairs by shared songs

This is a pair-generation problem disguised as recommendation logic. It checks whether you can self-join interaction data, count overlap, and remove existing friend pairs and duplicate pair orderings like A-B versus B-A.
Find recommended friend pairs by shared listening

This easier version still covers a pattern that comes up often in interviews. Focus on defining "shared listening" clearly, then deduplicate user pairs and aggregate shared behavior at the correct level.
Identify Unique Unordered City Pairs in Flight Log

This is another deduplication-by-canonical-order question. The interviewer is checking whether you know how to normalize pairs with LEAST/GREATEST or similar logic so reverse directions collapse into one record.

Clickstream, joins, and counting logic

Compute unique visitors per department from clicks

Unique counting questions sound simple, but they often hide many-to-one mapping issues. A good answer starts by deciding what counts as a visitor, what counts as a department visit, and where duplicates enter the join path.
Compute join counts and window ranks

This kind of prompt checks raw SQL fluency. You need to stay organized. First get the join output at the intended grain, then compute counts, then apply ranking without mixing grouped and windowed logic incorrectly.
Identify Date with Highest Total kWh Consumption Using Pandas

This is a straightforward aggregation problem, but interviewers still use it because it shows how you handle parsing, grouping, and selecting extrema. The main thing is to aggregate by the correct date grain before searching for the max.

Experimentation and analytical modeling

Build DID panel and compute effects in SQL A difference-in-differences question is less about advanced econometrics and more about clean panel construction. Interviewers want to see whether you can build treatment/control and pre/post slices correctly, then aggregate into the effect formula without mixing units.

Data engineering patterns in SQL interviews

Design idempotent daily loads with deduping Amazon interviews often blur the line between analytics SQL and data engineering. This question tests whether you understand stable reruns, deduplication keys, late-arriving records, and how merge or insert logic changes under production constraints.

Text processing and defensive data handling

Implement robust word counts and min/max This checks whether you can deal with dirty text and basic summary metrics without making brittle assumptions. Expect edge cases around punctuation, whitespace, casing, null values, and what exactly should count as a word.

What these Amazon questions are really testing

Across these 20 questions, a few themes show up again and again. Amazon likes SQL prompts where the table grain is easy to misunderstand, where filtering order changes the answer, and where duplicates can quietly break your result. Window functions, self-joins, conditional aggregation, and date logic matter a lot.

If you have an Amazon interview coming up, practice writing queries from scratch on event data, transaction data, and recommendation-style pair problems. Practice explaining your assumptions out loud too, because interviewers often care as much about your reasoning as the final query.

PracHub has 65+ Amazon Data Manipulation SQL questions, and this list is a good place to start if you want the most discussed ones first.

DEV Community