<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rubén Gil</title>
    <description>The latest articles on DEV Community by Rubén Gil (@rubengildev).</description>
    <link>https://dev.to/rubengildev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948130%2F5d326bbe-698a-4f5d-a491-d5c92953da17.jpeg</url>
      <title>DEV Community: Rubén Gil</title>
      <link>https://dev.to/rubengildev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rubengildev"/>
    <language>en</language>
    <item>
      <title>What 3.9M powerlifting records tell us about competition strategy — an EDA with Python</title>
      <dc:creator>Rubén Gil</dc:creator>
      <pubDate>Sat, 23 May 2026 20:02:03 +0000</pubDate>
      <link>https://dev.to/evolve-space/what-39m-powerlifting-records-tell-us-about-competition-strategy-an-eda-with-python-5g6k</link>
      <guid>https://dev.to/evolve-space/what-39m-powerlifting-records-tell-us-about-competition-strategy-an-eda-with-python-5g6k</guid>
      <description>&lt;p&gt;&lt;strong&gt;What 3.9M powerlifting records tell us about competition strategy — an EDA with Python&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;When I started this EDA project for my Data Science Master at &lt;a href="https://evolve.es" rel="noopener noreferrer"&gt;Evolve&lt;/a&gt;, I picked the Open Powerlifting dataset because beyond being a gym-rat, I've always been curious about the competition strategy in powerlifting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The dataset
&lt;/h2&gt;

&lt;p&gt;Open Powerlifting is an open-source project that tracks powerlifting competition results worldwide. The full dataset has ~3.9M rows and 42 columns covering athlete info, every single lift attempt, and performance metrics.&lt;/p&gt;

&lt;p&gt;Before any analysis I filtered it down to sanctioned, drug-tested competitions only and kept only the columns I'd actually use. The main challenge: &lt;strong&gt;negative values mean a failed lift&lt;/strong&gt;, not bad data. That required building boolean columns to track success/failure before converting negatives to NaN.&lt;/p&gt;




&lt;h2&gt;
  
  
  The process
&lt;/h2&gt;

&lt;p&gt;Fully modularized in Python using pandas, numpy, seaborn, matplotlib and pingouin. The pipeline runs end-to-end from &lt;code&gt;main.py&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;raw CSV → filter → clean → features → assert → analyze&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Imputation was done conservatively; age from AgeClass ranges, bodyweight from weight class, never synthetic values. Also, NaN values were filtered dynamically for each question.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqc7cvcav5uaidjpe614i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqc7cvcav5uaidjpe614i.png" alt="DOTS metric vs age to show performance by age and sex" width="799" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Peak performance age:&lt;/strong&gt; Athletes peak between 22-24 years old and decline steadily after. No meaningful difference between men and women once normalized by bodyweight.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0imcoyq0clik93pnskdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0imcoyq0clik93pnskdg.png" alt="Fail rates at third lift for Squat, Bench and Deadlift" width="799" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where do athletes fail most?&lt;/strong&gt; Bench press has a &lt;strong&gt;54% fail rate&lt;/strong&gt; on the 3rd attempt. Squat and deadlift sit around 36-40%. The gap is consistent across sexes and equipment types — bench just behaves differently.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hls5sep3unkziw5jh0c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hls5sep3unkziw5jh0c.png" alt="Success rate for 4th attempt for Squat, Bench and Deadlift" width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 4th attempt:&lt;/strong&gt; When athletes take a 4th attempt, they succeed &lt;strong&gt;~77% of the time&lt;/strong&gt; on average. Deadlift leads at 83%. This is the most actionable insight of the whole project — just take the 4th attempt.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;About powerlifting&lt;/strong&gt;&lt;br&gt;
Athletes peak between 22-24, always take the 4th attempt and make sure you won't fail the 3rd one, it can change the whole competition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About analysing data&lt;/strong&gt;&lt;br&gt;
If you have enough data, maybe it's better to not fill the gaps with artificial values. Also, some features must be built before cleaning or you'll spend an hour chating with AI wondering why all your booleans are NaN.&lt;/p&gt;

&lt;p&gt;Full code: &lt;a href="https://github.com/rubengil-dev/power_lifting_analisis" rel="noopener noreferrer"&gt;github.com/rubengil-dev/power_lifting_analisis&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Project developed during the Data Science Master at &lt;a href="https://evolve.es" rel="noopener noreferrer"&gt;Evolve&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>data</category>
      <category>sportscience</category>
    </item>
  </channel>
</rss>
