DEV Community

Cover image for What 3.9M powerlifting records tell us about competition strategy — an EDA with Python
Rubén Gil for Evolve

Posted on

What 3.9M powerlifting records tell us about competition strategy — an EDA with Python

What 3.9M powerlifting records tell us about competition strategy — an EDA with Python


When I started this EDA project for my Data Science Master at Evolve, I picked the Open Powerlifting dataset because beyond being a gym-rat, I've always been curious about the competition strategy in powerlifting.


The dataset

Open Powerlifting is an open-source project that tracks powerlifting competition results worldwide. The full dataset has ~3.9M rows and 42 columns covering athlete info, every single lift attempt, and performance metrics.

Before any analysis I filtered it down to sanctioned, drug-tested competitions only and kept only the columns I'd actually use. The main challenge: negative values mean a failed lift, not bad data. That required building boolean columns to track success/failure before converting negatives to NaN.


The process

Fully modularized in Python using pandas, numpy, seaborn, matplotlib and pingouin. The pipeline runs end-to-end from main.py:

raw CSV → filter → clean → features → assert → analyze

Imputation was done conservatively; age from AgeClass ranges, bodyweight from weight class, never synthetic values. Also, NaN values were filtered dynamically for each question.


Results

DOTS metric vs age to show performance by age and sex

Peak performance age: Athletes peak between 22-24 years old and decline steadily after. No meaningful difference between men and women once normalized by bodyweight.


Fail rates at third lift for Squat, Bench and Deadlift

Where do athletes fail most? Bench press has a 54% fail rate on the 3rd attempt. Squat and deadlift sit around 36-40%. The gap is consistent across sexes and equipment types — bench just behaves differently.


Success rate for 4th attempt for Squat, Bench and Deadlift

The 4th attempt: When athletes take a 4th attempt, they succeed ~77% of the time on average. Deadlift leads at 83%. This is the most actionable insight of the whole project — just take the 4th attempt.


What I learned

About powerlifting
Athletes peak between 22-24, always take the 4th attempt and make sure you won't fail the 3rd one, it can change the whole competition.

About analysing data
If you have enough data, maybe it's better to not fill the gaps with artificial values. Also, some features must be built before cleaning or you'll spend an hour chating with AI wondering why all your booleans are NaN.

Full code: github.com/rubengil-dev/power_lifting_analisis

Project developed during the Data Science Master at Evolve.

Top comments (0)