DEV Community

Yue
Yue

Posted on

Student Performance Dataset Analysis

Link to the dataset:
https://www.kaggle.com/datasets/grandmaster07/student-exam-performance-dataset-analysis

Hi everyone! I’m back with my very first research project. I’d love to hear your thoughts, questions, or any tips you might have - I’m all ears and eager to learn!

Following analysis doesn't chase a particular hypothesis to prove, or anything, I use dataset to test my own abilities.

First of all, I'm using this dataset to analyse correlations between given measurements, starting with heatmap:

The biggest surprise? Attendance (0.58) is a stronger predictor than Hours Studied (0.45). It’s not just about how much person works, but how consistent person is. (Surely, considering that this dataset is 'artificial', it's not really describing real-life cases, so don't take everything by heart)

Apart from that, there is a weak linear relationship between Exam Score and Previous Score + Tutoring Sessions, so I consider them as a secondary factors.

Looking at the heatmap again, it seems clear that other numbers are negligible correlations, so not paying a lot of attention there.

When plotting Exam Scores against School Type, the distributions are nearly identical. The median score stays flat across both categories.

When plotting Exam Scores against School Type, the distributions are nearly identical. The median score stays flat across both categories.

I conducted an outlier analysis to identify 'High-Efficiency' students—those who outperformed their peers despite lower-than-average study hours. I defined this group by filtering for individuals whose study time was below the mean, yet their exam scores were at least 10 points above the population average.

When I looked at 'High-Efficiency' students I found that there are no true shortcuts. The data is remarkably consistent: outliers are rare, and those who do exist usually rely on Medium-High Parental Involvement to bridge the gap.

Top comments (0)