10 Data Analyst Interview Questions You Should Be Ready For (2025) πΌπ
1οΈβ£ Q: Explain the difference between INNER JOIN and LEFT JOIN.
β
INNER JOIN returns only matching records between tables.
β
LEFT JOIN returns all records from the left table + matched records from the right (NULLs if no match).
Example:
Get all customers and their orders:
- INNER JOIN β shows only customers who placed orders.
- LEFT JOIN β shows all customers, even those with no orders.
2οΈβ£ Q: What are window functions in SQL?
Window functions perform calculations across a set of rows related to the current row.
Example:
Use ROW_NUMBER() to rank sales per region without collapsing data:
sql
SELECT region, sales, ROW_NUMBER() OVER(PARTITION BY region ORDER BY sales DESC)
FROM sales_data;
3οΈβ£ Q: How do you handle missing or duplicate data in a dataset?
- For missing data: use imputation (mean/median), drop, or flag it.
- For duplicates: use .drop_duplicates() in Python or DISTINCT in SQL. Approach depends on context and business impact.
4οΈβ£ Q: Describe a situation where you derived insights that influenced a business decision.
Example: Analyzed user engagement data β found drop in mobile retention. Suggested UX improvements β increased retention by 15%.
5οΈβ£ Q: Whatβs the difference between correlation and causation?
- Correlation = relationship (e.g., A & B move together)
- Causation = A directly causes B Always validate with experiments or domain expertise.
6οΈβ£ Q: How would you optimize a slow SQL query?
- Check for missing indexes
- Avoid SELECT *
- Use EXPLAIN plans
- Limit subqueries
- Optimize joins and filters
7οΈβ£ Q: Explain the use of GROUP BY and HAVING in SQL.
- GROUP BY aggregates data by columns (e.g., total sales by region)
- HAVING filters aggregated results Example: sql SELECT region, SUM(sales) FROM orders GROUP BY region HAVING SUM(sales) > 10000;
8οΈβ£ Q: How do you choose the right chart for a dataset?
- Line chart β trends
- Bar chart β comparison
- Pie chart β proportion (rarely ideal)
- Scatter plot β relationships Always match chart to message and audience.
9οΈβ£ Q: Whatβs the difference between a dashboard and a report?
- Dashboard β Interactive, real-time, for quick insights
- Report β Static or scheduled, detailed summaries Dashboards = action; reports = reference.
π Q: Which libraries in Python do you use for data cleaning and analysis?
- pandas β data manipulation
- numpy β numerical ops
- matplotlib / seaborn β visualization
- sklearn β basic ML and preprocessing
Top comments (0)