<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Larry Barrow</title>
    <description>The latest articles on DEV Community by Larry Barrow (@dale21certs).</description>
    <link>https://dev.to/dale21certs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3825908%2F00436e10-ac95-4be0-ba25-59ed22fa2bb6.png</url>
      <title>DEV Community: Larry Barrow</title>
      <link>https://dev.to/dale21certs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dale21certs"/>
    <language>en</language>
    <item>
      <title>The Fairness Metrics Your ML Model Needs -And Why Accuracy Isn't One of Them</title>
      <dc:creator>Larry Barrow</dc:creator>
      <pubDate>Tue, 07 Apr 2026 15:00:55 +0000</pubDate>
      <link>https://dev.to/dale21certs/the-fairness-metrics-your-ml-model-needs-and-why-accuracy-isnt-one-of-them-5eb</link>
      <guid>https://dev.to/dale21certs/the-fairness-metrics-your-ml-model-needs-and-why-accuracy-isnt-one-of-them-5eb</guid>
      <description>&lt;p&gt;Your fraud detection model hits 99.8% accuracy. Ship it?&lt;/p&gt;

&lt;p&gt;Not so fast. That number means your model predicts "not fraud" for every single transaction — and it's right 99.8% of the time because only 0.2% of transactions are actually fraudulent. It catches exactly zero fraud cases. Accuracy told you everything was fine. It was lying.&lt;/p&gt;

&lt;p&gt;This is the class imbalance trap, and it's the most common evaluation mistake I see teams make when deploying ML models into production. But it's just the beginning. Even when you move past accuracy to better metrics, there's a harder question most teams never ask: &lt;strong&gt;is my model fair?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Metrics You Actually Need
&lt;/h2&gt;

&lt;p&gt;Before we talk about fairness, let's fix the basics. For any classification problem — fraud detection, loan approval, medical screening, content moderation — you need to understand four numbers from the confusion matrix:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;True Positives (TP):&lt;/strong&gt; Model said yes, answer was yes.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;True Negatives (TN):&lt;/strong&gt; Model said no, answer was no.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;False Positives (FP):&lt;/strong&gt; Model said yes, answer was no. (Type I error)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;False Negatives (FN):&lt;/strong&gt; Model said no, answer was yes. (Type II error)&lt;/p&gt;

&lt;p&gt;From these, three metrics matter far more than accuracy:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Precision&lt;/strong&gt; = TP / (TP + FP) — "Of everything the model flagged, how much was real?"&lt;/p&gt;

&lt;p&gt;High precision means fewer false alarms. Optimize for this when false positives are expensive. Example: spam filtering. Losing a legitimate email to the spam folder is worse than letting a spam message through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recall&lt;/strong&gt; = TP / (TP + FN) — "Of everything that was actually positive, how much did the model catch?"&lt;/p&gt;

&lt;p&gt;High recall means fewer missed cases. Optimize for this when false negatives are dangerous. Example: cancer screening. Missing a malignant tumor is far worse than a false alarm that leads to an additional test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;F1 Score&lt;/strong&gt; = 2 × (Precision × Recall) / (Precision + Recall) — The harmonic mean that balances both.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;precision and recall are in tension&lt;/strong&gt;. Lowering your classification threshold catches more positives (higher recall) but also flags more negatives incorrectly (lower precision). The right balance depends entirely on your business context and the cost of each error type.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Threshold Decision That Changes Everything
&lt;/h2&gt;

&lt;p&gt;Most models output a probability between 0 and 1. You choose a threshold (typically 0.5) above which you predict "positive." But 0.5 is arbitrary. The right threshold depends on the relative cost of errors:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Threshold Strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cancer screening&lt;/td&gt;
&lt;td&gt;Recall&lt;/td&gt;
&lt;td&gt;Lower threshold — don't miss cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Email spam filter&lt;/td&gt;
&lt;td&gt;Precision&lt;/td&gt;
&lt;td&gt;Higher threshold — don't lose real email&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fraud detection&lt;/td&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;td&gt;Analyze cost matrix: cost of fraud vs. cost of investigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Loan approval&lt;/td&gt;
&lt;td&gt;Context-dependent&lt;/td&gt;
&lt;td&gt;Regulatory requirements may dictate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is where AUC-ROC becomes useful — it measures model performance across &lt;em&gt;all&lt;/em&gt; thresholds, giving you a single number (0.5 = random, 1.0 = perfect) that captures discrimination ability independent of threshold choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Now the Hard Part: Is Your Model Fair?
&lt;/h2&gt;

&lt;p&gt;Here's where most teams stop. They pick the right metric, tune the threshold, hit a good F1 score, and deploy. But they never ask: &lt;strong&gt;does the model perform equally well for everyone?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical concern. A widely reported healthcare algorithm used by major US hospitals systematically deprioritized Black patients for additional care — not because it was explicitly designed to discriminate, but because it used healthcare spending as a proxy for illness severity. Since Black patients historically had less access to healthcare spending, the model learned that they were "healthier" and needed less care. The algorithm affected millions of patients.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Proxy Variable Problem
&lt;/h3&gt;

&lt;p&gt;The first instinct is to remove protected attributes (race, gender, age) from your feature set. This does not work. Proxy variables reintroduce bias indirectly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ZIP code&lt;/strong&gt; correlates with race due to residential segregation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name patterns&lt;/strong&gt; correlate with gender and ethnicity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Education level&lt;/strong&gt; correlates with socioeconomic background&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Purchase history&lt;/strong&gt; correlates with income and access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You cannot engineer your way out of bias by removing columns. You have to &lt;strong&gt;measure&lt;/strong&gt; it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fairness Metrics That Matter
&lt;/h3&gt;

&lt;p&gt;Here are the metrics you should be computing across demographic groups in any high-stakes model:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demographic Parity:&lt;/strong&gt; Do all groups receive positive predictions at the same rate? &lt;/p&gt;

&lt;p&gt;Check: Is P(ŷ=1 | Group A) ≈ P(ŷ=1 | Group B)?&lt;/p&gt;

&lt;p&gt;Use when equal outcome rates are the goal (e.g., hiring).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Equalized Odds:&lt;/strong&gt; Does the model have equal true positive rates AND equal false positive rates across groups?&lt;/p&gt;

&lt;p&gt;Use when you need accuracy to be consistent for everyone (e.g., medical diagnosis).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Equal Opportunity:&lt;/strong&gt; Does the model have equal true positive rates across groups? (Relaxed version of equalized odds.)&lt;/p&gt;

&lt;p&gt;Use when catching positives equally is the priority (e.g., loan default detection — don't miss defaults more often for one group).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predictive Parity:&lt;/strong&gt; When the model predicts positive, is it equally likely to be correct across groups?&lt;/p&gt;

&lt;p&gt;Use when positive predictions must be equally trustworthy regardless of group.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Impossibility Theorem You Need to Know
&lt;/h3&gt;

&lt;p&gt;Here's the uncomfortable truth: &lt;strong&gt;you cannot satisfy all fairness metrics simultaneously.&lt;/strong&gt; This is mathematically proven (Chouldechova, 2017; Kleinberg et al., 2016). If base rates differ across groups — which they almost always do in real-world data — demographic parity, equalized odds, and predictive parity are mutually exclusive.&lt;/p&gt;

&lt;p&gt;This means fairness is not a technical problem you solve once. It's a &lt;strong&gt;design decision&lt;/strong&gt; you make explicitly, document clearly, and revisit regularly. Which fairness definition matters most for your use case? Who decides? What are the tradeoffs? These questions require human judgment, not just code.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Starting Point
&lt;/h2&gt;

&lt;p&gt;If you're deploying a model that affects people's lives — and most production models do, whether you realize it or not — here's a minimum viable fairness workflow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Define your groups.&lt;/strong&gt; Identify the demographic segments relevant to your application. Don't assume you know — consult domain experts and affected communities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Compute disaggregated metrics.&lt;/strong&gt; Don't just report overall F1. Break it down by group. A model with 0.85 F1 overall might have 0.92 for one group and 0.71 for another.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Apply the four-fifths rule as a starting heuristic.&lt;/strong&gt; If any group's selection rate falls below 80% of the highest group's rate, you have a disparity worth investigating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Choose your fairness definition.&lt;/strong&gt; Based on your application context, decide which metric to optimize and document why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Monitor in production.&lt;/strong&gt; Fairness isn't a one-time check. Data distributions shift, user populations change, and new biases can emerge after deployment. Build fairness metrics into your monitoring pipeline alongside performance metrics.&lt;/p&gt;

&lt;p&gt;The tools exist: Microsoft's Fairlearn, Google's What-If Tool, AWS SageMaker Clarify, and IBM's AI Fairness 360 all provide production-ready fairness measurement and mitigation capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going Deeper
&lt;/h2&gt;

&lt;p&gt;Model evaluation and responsible AI are interconnected disciplines — you can't do one well without the other. I've written a more in-depth treatment covering the full evaluation lifecycle, fairness auditing frameworks, calibration analysis, and cross-vendor tooling in my &lt;a href="https://powerkram.com/ai-machine-learning-articles/responsible-ai-ethics" rel="noopener noreferrer"&gt;Responsible AI and Ethics guide&lt;/a&gt;, which is part of a broader AI/ML training series I maintain.&lt;/p&gt;

&lt;p&gt;If this topic resonates, I'd love to hear how your team handles fairness in practice. What fairness definition do you use? Have you hit the impossibility tradeoff in a real project? Drop your experience in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was created with AI assistance for drafting and editing. All technical content reflects my professional experience in ML engineering and has been verified for accuracy.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>responsibleai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Developer and Machine Learning</title>
      <dc:creator>Larry Barrow</dc:creator>
      <pubDate>Sun, 15 Mar 2026 22:57:16 +0000</pubDate>
      <link>https://dev.to/dale21certs/developer-and-machine-learning-dla</link>
      <guid>https://dev.to/dale21certs/developer-and-machine-learning-dla</guid>
      <description>&lt;p&gt;What Developers Should Understand About Machine Learning (Before Touching a Model)&lt;/p&gt;

&lt;p&gt;Most developers don’t struggle with machine learning because the math is hard. They struggle because the explanations are disconnected from real engineering work. After years of helping people ramp up on ML, I’ve learned that the most effective way to teach it is to anchor everything in scenarios, workflows, and constraints — the things developers deal with every day.&lt;/p&gt;

&lt;p&gt;I’m Larry Dale, founder of PowerKram (&lt;a href="https://powerkram.com" rel="noopener noreferrer"&gt;https://powerkram.com&lt;/a&gt;), where I build scenario‑based learning systems for people who want to understand how ML actually works in practice, not just in theory.&lt;/p&gt;

&lt;p&gt;This post is a distilled version of the fundamentals I teach developers who are new to ML or integrating ML into their systems.&lt;/p&gt;

&lt;p&gt;Why Developers Should Care About ML Fundamentals&lt;br&gt;
Even if you’re not training models full‑time, ML concepts show up everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data pipelines&lt;/li&gt;
&lt;li&gt;API integrations&lt;/li&gt;
&lt;li&gt;cloud services that quietly rely on ML&lt;/li&gt;
&lt;li&gt;systems that adapt to user behavior&lt;/li&gt;
&lt;li&gt;automation workflows&lt;/li&gt;
&lt;li&gt;analytics and forecasting features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding ML fundamentals helps developers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;design better architectures&lt;/li&gt;
&lt;li&gt;reason about model behavior&lt;/li&gt;
&lt;li&gt;debug data‑driven systems&lt;/li&gt;
&lt;li&gt;evaluate vendor ML services&lt;/li&gt;
&lt;li&gt;avoid common pitfalls around drift, bias, and overfitting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don’t need to be a data scientist to benefit from ML literacy.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Mental Model Shift: Rules → Patterns
Traditional programming is explicit:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Input + Rules → Output&lt;/li&gt;
&lt;li&gt;Machine learning flips that:&lt;/li&gt;
&lt;li&gt;Input + Output → Learned Rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shift is the foundation of ML thinking. Once developers internalize it, the rest of the ecosystem becomes far less mysterious.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Three Learning Styles That Cover 90% of Real Work
I frame ML for developers using three practical categories:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Supervised Learning&lt;br&gt;
Learn from labeled examples.&lt;br&gt;
Used for: classification, regression, forecasting, scoring.&lt;/p&gt;

&lt;p&gt;Unsupervised Learning&lt;br&gt;
Find structure in unlabeled data.&lt;br&gt;
Used for: clustering, anomaly detection, dimensionality reduction.&lt;/p&gt;

&lt;p&gt;Reinforcement Learning&lt;br&gt;
Learn by trial and error.&lt;br&gt;
Used for: optimization, robotics, sequential decision‑making.&lt;/p&gt;

&lt;p&gt;This framing helps developers map problems to ML approaches quickly.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Classification vs. Regression (Developer Edition)
I explain it this way:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Classification → choose a category&lt;/li&gt;
&lt;li&gt;Regression → predict a number&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples developers immediately recognize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Is this request suspicious?” → classification&lt;/li&gt;
&lt;li&gt;“How long will this job run?” → regression&lt;/li&gt;
&lt;li&gt;“Which product should we recommend?” → classification&lt;/li&gt;
&lt;li&gt;“What will traffic look like next hour?” → regression&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Simple distinctions, huge clarity.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The ML Workflow Mirrors Real Engineering Work
Every ML project — whether you’re using Python notebooks, cloud ML services, or custom pipelines — follows the same lifecycle:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Define the problem&lt;/li&gt;
&lt;li&gt;Prepare the data (the longest step by far)&lt;/li&gt;
&lt;li&gt;Train the model&lt;/li&gt;
&lt;li&gt;Evaluate the model&lt;/li&gt;
&lt;li&gt;Deploy the model&lt;/li&gt;
&lt;li&gt;Monitor and maintain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Developers immediately see the parallels with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI/CD&lt;/li&gt;
&lt;li&gt;API lifecycle&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;versioning&lt;/li&gt;
&lt;li&gt;performance tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ML isn’t magic — it’s engineering with statistical components.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Bias‑Variance Tradeoff Explained for Engineers
I use this analogy:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;High bias = underfitting = too few parameters&lt;/li&gt;
&lt;li&gt;High variance = overfitting = too many parameters&lt;/li&gt;
&lt;li&gt;It’s like tuning a system:&lt;/li&gt;
&lt;li&gt;too simple → can’t capture behavior&lt;/li&gt;
&lt;li&gt;too complex → memorizes noise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finding the balance is part science, part intuition, part iteration.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Feature Engineering: The Part Developers Excel At
Developers are naturally good at feature engineering because it’s basically:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;data modeling&lt;/li&gt;
&lt;li&gt;transformation&lt;/li&gt;
&lt;li&gt;normalization&lt;/li&gt;
&lt;li&gt;encoding&lt;/li&gt;
&lt;li&gt;domain‑driven design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good features often outperform fancy algorithms.&lt;br&gt;
I’ve seen simple models beat deep models purely because the data was well‑prepared.&lt;/p&gt;

&lt;p&gt;What I’ll Be Writing About Next&lt;br&gt;
I’ll be publishing more posts on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ML fundamentals explained clearly&lt;/li&gt;
&lt;li&gt;real‑world ML workflows&lt;/li&gt;
&lt;li&gt;scenario‑based learning&lt;/li&gt;
&lt;li&gt;cross‑vendor cloud AI concepts&lt;/li&gt;
&lt;li&gt;how developers can integrate ML responsibly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re learning ML or building systems that rely on it, I’d love to hear what topics you want broken down next. Meanwhile, consider reading more on &lt;a href="https://synchronizedsoftware.com/neural-networks/" rel="noopener noreferrer"&gt;neural networks&lt;/a&gt;&lt;em&gt;, and _&lt;a href="https://synchronizedsoftware.com/machine-learning-fundamentals/" rel="noopener noreferrer"&gt;machine learning&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;— Larry Dale&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
