<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Felipe Bojorque</title>
    <description>The latest articles on DEV Community by Felipe Bojorque (@felipebojorquem).</description>
    <link>https://dev.to/felipebojorquem</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3925400%2Fcae3079f-32a0-4941-aab8-f1cb4e8c141f.png</url>
      <title>DEV Community: Felipe Bojorque</title>
      <link>https://dev.to/felipebojorquem</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/felipebojorquem"/>
    <language>en</language>
    <item>
      <title>How I Found Machine Failure Patterns in Industrial Sensor Data — Predictive Maintenance EDA</title>
      <dc:creator>Felipe Bojorque</dc:creator>
      <pubDate>Mon, 11 May 2026 16:41:52 +0000</pubDate>
      <link>https://dev.to/evolve-space/how-i-found-machine-failure-patterns-in-industrial-sensor-data-predictive-maintenance-eda-49ih</link>
      <guid>https://dev.to/evolve-space/how-i-found-machine-failure-patterns-in-industrial-sensor-data-predictive-maintenance-eda-49ih</guid>
      <description>&lt;p&gt;Predictive maintenance is one of the highest-impact applications of data science &lt;br&gt;
in industry — and one of the least saturated. As a mechatronics engineer with &lt;br&gt;
real experience in industrial plant maintenance, I wanted to build a project that &lt;br&gt;
reflects what actual failure analysis looks like, not just a generic ML exercise.&lt;/p&gt;

&lt;p&gt;This is the EDA (Exploratory Data Analysis) phase of a full predictive maintenance &lt;br&gt;
pipeline I'm building as part of my Master's in Data Science &amp;amp; AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dataset
&lt;/h2&gt;

&lt;p&gt;I used the AI4I 2020 Predictive Maintenance dataset from the UCI Machine Learning &lt;br&gt;
Repository — 10,000 records of synthetic industrial sensor data with 5 labeled &lt;br&gt;
failure types: Tool Wear Failure (TWF), Heat Dissipation Failure (HDF), Power &lt;br&gt;
Failure (PWF), Overstrain Failure (OSF), and Random Failure (RNF).&lt;/p&gt;

&lt;p&gt;The dataset is highly imbalanced: only 3.39% of records are failures (339 out of &lt;br&gt;
10,000). That's realistic — in real plants, failures are rare events, and that &lt;br&gt;
imbalance has direct implications for modeling (SMOTE or class_weight will be &lt;br&gt;
required).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Process
&lt;/h2&gt;

&lt;p&gt;Beyond standard EDA, I engineered 3 physically grounded features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;temp_delta&lt;/strong&gt;: process temperature minus air temperature — a proxy for thermal 
stress and heat dissipation capacity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;power_W&lt;/strong&gt;: mechanical power estimated as P = τ × ω — captures the actual 
operating regime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;wear_rate&lt;/strong&gt;: tool wear normalized by power — flags degraded tools running 
under demanding conditions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full pipeline (load → clean → validate → features → Parquet export → &lt;br&gt;
profiling report) runs reproducibly via a single &lt;code&gt;uv run python main.py&lt;/code&gt; command. &lt;br&gt;
Stack: Python 3.11, pandas, plotly, seaborn, pandera, loguru, pyarrow, &lt;br&gt;
ydata-profiling, ruff, pytest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Findings
&lt;/h2&gt;

&lt;p&gt;Each failure type has a different dominant predictor — which is exactly what you'd &lt;br&gt;
expect from a real industrial system:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HDF (Heat Dissipation Failure)&lt;/strong&gt; is predicted by &lt;code&gt;temp_delta&lt;/code&gt;. Normal operation &lt;br&gt;
sits at a median of 9.8 K; HDF cases drop to 8.3 K (a 1.5 K reduction). The low &lt;br&gt;
variance in the failure group (std = 0.28 K) suggests a near-deterministic &lt;br&gt;
activation threshold around 8.5 K.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TWF (Tool Wear Failure)&lt;/strong&gt; shows near-perfect separation: 100% of TWF failures &lt;br&gt;
occur above 198 minutes of accumulated wear, while 75% of normal operation stays &lt;br&gt;
below 162 min. This isn't gradual degradation — it's a threshold behavior, which &lt;br&gt;
means a rule-based alert could catch most TWF cases before they happen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PWF/OSF&lt;/strong&gt; cluster in a danger zone at low RPM and high torque (1,401–1,700 rpm × &lt;br&gt;
61–80 Nm), where failure rates reach 71.4%. The Torque↔RPM correlation of -0.88 &lt;br&gt;
confirms the machine operates at approximately constant power, validating &lt;code&gt;power_W&lt;/code&gt; &lt;br&gt;
as a representative feature of the operating regime.&lt;/p&gt;

&lt;p&gt;Product type also matters: type L fails at 3.92% vs 2.09% for type H — 1.88x &lt;br&gt;
higher — independent of tool wear distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;The most important insight wasn't statistical — it was structural: &lt;strong&gt;failure &lt;br&gt;
subtypes sum to 373 events while total Machine failure = 339&lt;/strong&gt;, confirming &lt;br&gt;
simultaneous multi-mode failures exist in the dataset. This has direct implications &lt;br&gt;
for modeling: a single binary classifier isn't enough; a multi-label approach will &lt;br&gt;
be needed.&lt;/p&gt;

&lt;p&gt;I also learned that physically meaningful features outperform arbitrary &lt;br&gt;
transformations. Being able to explain &lt;code&gt;temp_delta&lt;/code&gt; or &lt;code&gt;power_W&lt;/code&gt; in a technical &lt;br&gt;
interview — with the physics behind them — is a stronger signal than a feature &lt;br&gt;
with better correlation but no domain justification.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This EDA is Phase 1 of a full PdM (Predictive Maintenance) system:&lt;br&gt;
→ Advanced feature engineering&lt;br&gt;
→ Multi-class failure classifier (Random Forest / XGBoost + SHAP)&lt;br&gt;
→ RUL (Remaining Useful Life) prediction&lt;br&gt;
→ FastAPI deployment + Docker&lt;br&gt;
→ MLflow monitoring + drift detection&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full repo:&lt;/strong&gt; &lt;a href="https://github.com/felipebojorquem/Proyecto-Master-DataScience-Evolve-FelipeBojorque" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Project developed as part of the Master's in Data Science &amp;amp; AI at Evolve.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>pandas</category>
    </item>
  </channel>
</rss>
