Losing great employees is incredibly expensive for businesses. To show potential employers how I tackle real-world business problems using data engineering and visualization, I built an end-to-end HR Attrition Analysis project using the classic IBM HR Analytics dataset (1,470 employees, 35 features).
Here is exactly how I took this raw data from local SQL ingestion to an executive-ready Power BI dashboard.
๐๏ธ Step 1: Database Ingestion & Quality Checks (MySQL)
Enterprise data lives in relational databases, not flat CSV files. I started by spinning up a local schema in MySQL Workbench and importing the raw dataset.
Before running metrics, I performed a "sanity check" to ensure data integrity. I verified that there were zero duplicate records using the unique EmployeeNumber key and checked for missing values:
SQL
-- Checking for duplicates on the primary key
SELECT EmployeeNumber, COUNT(*)
FROM hr_employee_attrition
GROUP BY EmployeeNumber
HAVING COUNT(*) > 1;
Result: 0 duplicates. The structural data health was clean.
๐งน Step 2: Data Cleaning & Transformation
A common mistake is overloading a BI tool with uncleaned data. To optimize performance, I built a permanent Database View to drop zero-variance columns (like StandardHours, which was identical for every employee) and transform text fields into binary indicators ($1$ and $0$).
SQL
CREATE VIEW vw_hr_attrition_clean AS
SELECT
EmployeeNumber, Age, Department, JobRole, MonthlyIncome, YearsAtCompany,
CASE WHEN Attrition = 'Yes' THEN 1 ELSE 0 END AS Attrition_Flag,
CASE WHEN OverTime = 'Yes' THEN 1 ELSE 0 END AS OverTime_Flag
FROM hr_employee_attrition;
This thin architectural layer makes calculating exact percentages downstream incredibly fast.
๐ Step 3: Segmenting the Risk with SQL
Next, I used aggregation queries to pinpoint exactly where turnover was happening. I analyzed attrition rates across different departments and salary brackets:
SQL
-- Calculating Attrition Rate by Department
SELECT Department, COUNT(*) as Total_Employees,
ROUND(AVG(Attrition_Flag)*100, 2) as Attrition_Rate
FROM vw_hr_attrition_clean
GROUP BY Department
ORDER BY Attrition_Rate DESC;
๐ Step 4: Connecting & Modeling in Power BI
Instead of using static exports, I connected Power BI directly to my local MySQL server using Import Mode.
To maintain clean DAX architecture, I created a dedicated measure matrix table and wrote explicit KPIs rather than relying on default column summaries:
Total Employees = COUNT(vw_hr_attrition_clean[EmployeeNumber])
Total Attrition = SUM(vw_hr_attrition_clean[Attrition_Flag])
Attrition Rate = DIVIDE([Total Attrition], [Total Employees], 0)
The BI Dashboard
๐ก Step 5: High-Impact Business Takeaways
Data is just noise without strategic context. Based on the dashboard interactions, I identified three massive "flight risks" and drafted immediate HR action items:
The Overtime Smoking Gun: Employees logging chronic overtime exhibit a 30.6% attrition rate (3x higher than non-overtime peers).
Recommendation: Deploy an automated HR flag system when operational teams cross consecutive overtime thresholds.The 1-Year Tenure Cliff: Attrition is heavily concentrated among employees in their first 12 months (>30%).
Recommendation: Revamp onboarding tracks with structured 30/60/90-day sentiment check-ins.Sales Representative Volatility: Sales Reps had an outlier attrition rate of 39.8%, linked to low starting base pay (<$4k/month).
Recommendation: Restructure early compensation frameworks to favor a higher base salary over pure commission during year one.


Top comments (0)