<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Naomi Jepkorir</title>
    <description>The latest articles on DEV Community by Naomi Jepkorir (@datawithnaomi).</description>
    <link>https://dev.to/datawithnaomi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3140941%2Fa6d40853-f372-4251-b4a4-9b3dae87b8d1.jpg</url>
      <title>DEV Community: Naomi Jepkorir</title>
      <link>https://dev.to/datawithnaomi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/datawithnaomi"/>
    <language>en</language>
    <item>
      <title>Surviving a Kernel Panic: My Ubuntu War Story</title>
      <dc:creator>Naomi Jepkorir</dc:creator>
      <pubDate>Mon, 23 Mar 2026 07:17:40 +0000</pubDate>
      <link>https://dev.to/datawithnaomi/surviving-a-kernel-panic-my-ubuntu-war-story-53j5</link>
      <guid>https://dev.to/datawithnaomi/surviving-a-kernel-panic-my-ubuntu-war-story-53j5</guid>
      <description>&lt;p&gt;When your backlog is full of data science models and software engineering tasks, the last thing you need is your OS failing to boot because of a kernel panic, no?&lt;/p&gt;

&lt;p&gt;Well, this happened to me. I was given the option to reboot, did it the first time, and I was back in after choosing an older kernel version from the menu. It happened a second time, a third time... and I was just okay with it, saying, "I'll fix it later." &lt;br&gt;
This worked like a charm right up until Linux decided it had had enough. It finally threw an &lt;code&gt;Input/output error&lt;/code&gt;, locked the root filesystem &lt;code&gt;/&lt;/code&gt; into &lt;code&gt;Read-only&lt;/code&gt; mode, and locked me out completely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sr79ity097cgbe96614.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sr79ity097cgbe96614.jpeg" alt="A laptop screen showing a Linux terminal failing to execute commands, flooded with " width="720" height="1280"&gt;&lt;/a&gt;&lt;br&gt;
I couldn't even poweroff via the terminal.&lt;/p&gt;

&lt;p&gt;The silver lining came when I realized I had a Kali Linux installer ISO sitting on a flash drive nearby. In case you find yourself in this,let's call it "very specific" situation, here is how to perform open-heart surgery on your system before you do anything rash like wiping your drive.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: The Hard Reset &amp;amp; The USB Boot
&lt;/h2&gt;

&lt;p&gt;Since the terminal was completely frozen, a graceful shutdown was out of the question. Long-press the power button to force the PC to power off. Turn it back on and immediately open your startup menu (usually by spamming Esc, F12, or F9 depending on your laptop model). From there, boot directly into your rescue USB.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 2: Dropping into the Underworld (The BusyBox Shell)
&lt;/h2&gt;

&lt;p&gt;Because I was using a Kali Linux installer ISO (not a Live Desktop), there was no beautiful graphical interface to save me. I had to navigate the installer menu and select "Execute a shell".&lt;/p&gt;

&lt;p&gt;This drops you into a raw, stripped-down BusyBox ash shell. From here, we need to find exactly where the broken Ubuntu system lives on the hard drive.&lt;/p&gt;

&lt;p&gt;Run this command to list your partitions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fdisk &lt;span class="nt"&gt;-l&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scan the output for your main Linux filesystem. In my case, Ubuntu was sitting on &lt;code&gt;/dev/sda2&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Find and repair
&lt;/h2&gt;

&lt;p&gt;Because I was forced to hard-reset the machine multiple times while the OS was locked up, the filesystem metadata was corrupted. Files were left hanging in memory, creating "orphaned inodes." If you try to boot with a corrupted filesystem, Linux will panic to protect your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: Make sure your broken partition is not mounted.&lt;br&gt;
Quick check just to be safe:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;umount /dev/sda2/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's time to repair the drive. Run the ext4 filesystem check tool on your specific partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;e2fsck &lt;span class="nt"&gt;-y&lt;/span&gt; /dev/sda2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;em&gt;💡Tip&lt;/em&gt;: The &lt;code&gt;-y&lt;/code&gt; flag is crucial here. It automatically answers "yes" to the hundreds of prompts asking if you want to fix individual corrupted sectors. Without it, you will be holding down the 'Y' key for an eternity).&lt;/p&gt;

&lt;p&gt;Once the screen stops scrolling, you are looking for the holy grail: a message declaring your drive &lt;code&gt;/dev/sda2: clean&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsr48xz63cwl8sjz4r8tu.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsr48xz63cwl8sjz4r8tu.jpeg" alt="A Linux terminal displaying the output of the  raw `e2fsck` endraw  command successfully clearing orphaned inodes and reporting the  raw `/dev/sda2 ` endraw filesystem as clean." width="720" height="1280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: The Boot Menu &amp;amp; The Investigation
&lt;/h2&gt;

&lt;p&gt;With the filesystem repaired, type reboot (or hard reset again if the shell is stubborn) and unplug the USB.&lt;/p&gt;

&lt;p&gt;Do not let it boot normally. As soon as the PC turns on, spam &lt;code&gt;esc&lt;/code&gt; to bring up the GRUB boot menu. Go to Advanced options for Ubuntu and manually select an older kernel version (e.g., 6.14.x instead of the newest one).&lt;/p&gt;

&lt;p&gt;Once you are successfully booted into your desktop, open a terminal. It is time to find the killer. List all installed kernels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dpkg &lt;span class="nt"&gt;--list&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;linux-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you read the output carefully, you'll likely spot the culprit. Look at the two letters on the far left of the list:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ii&lt;/code&gt; means Installed and Intact (This is your stable, older kernel).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;it&lt;/code&gt; means Installed, Triggers pending (This is a half-baked, broken kernel).&lt;/p&gt;

&lt;p&gt;In my case, the system had tried to automatically update to the 6.17 kernel in the background. But a third-party module, specifically VirtualBox DKMS, failed to compile for the new kernel architecture. VirtualBox crashed, which halted the entire kernel installation halfway through, leaving my machine with an unbootable OS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: The Execution(Purging the Rot)
&lt;/h2&gt;

&lt;p&gt;Now we just need to clean up the mess and permanently delete the broken kernel so the system stops trying to default to it.&lt;/p&gt;

&lt;p&gt;First, get the blocking software out of the way (you can reinstall it later):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt remove &lt;span class="nt"&gt;--purge&lt;/span&gt; virtualbox-dkms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, tell the package manager to unjam itself and fix any half-installed dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nt"&gt;--fix-broken&lt;/span&gt; &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, drop the hammer on the broken kernel (replace the version numbers with the broken one from your dpkg list):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt purge linux-image-6.17.0-19-generic linux-headers-6.17.0-19-generic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, sweep up the orphaned packages and update your boot menu to lock in your stable kernel as the new default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt autoremove
&lt;span class="nb"&gt;sudo &lt;/span&gt;update-grub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All good 😊✨&lt;/p&gt;

&lt;h2&gt;
  
  
  A little note...
&lt;/h2&gt;

&lt;p&gt;Isn't it funny how things completely falling apart is usually the best way to figure out how they actually work?&lt;/p&gt;

&lt;p&gt;But hey, in case you found yourself in this situation, found your way here, and this still doesn't work... you should probably just delete that OS 😂 or contact your local AI agent.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>ubuntu</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>RAG for Dummies</title>
      <dc:creator>Naomi Jepkorir</dc:creator>
      <pubDate>Thu, 18 Sep 2025 17:42:36 +0000</pubDate>
      <link>https://dev.to/datawithnaomi/rag-for-dummies-3f2p</link>
      <guid>https://dev.to/datawithnaomi/rag-for-dummies-3f2p</guid>
      <description>&lt;p&gt;If you’ve been following AI news, you’ve probably heard the term &lt;strong&gt;RAG&lt;/strong&gt; popping up everywhere.&lt;br&gt;
No, it’s not about cleaning your house, &lt;strong&gt;RAG&lt;/strong&gt; stands for &lt;strong&gt;Retrieval-Augmented Generation&lt;/strong&gt;, and it’s one of the most exciting techniques in AI right now.&lt;/p&gt;

&lt;p&gt;Let’s break it down so it’s easy to understand, no technical jargon required.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 What is RAG?
&lt;/h2&gt;

&lt;p&gt;Think of RAG as an AI that &lt;strong&gt;does its homework&lt;/strong&gt; before giving you an answer.&lt;br&gt;&lt;br&gt;
Here’s what the name means:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt; – The AI first &lt;strong&gt;looks up relevant information&lt;/strong&gt; from a database, knowledge base, or document collection.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Augmented Generation&lt;/strong&gt; – It then &lt;strong&gt;uses that information&lt;/strong&gt; to generate a complete, accurate answer.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of just guessing based on what it was trained on months or years ago, RAG can stay up to date and grounded in real facts.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivlxtnkx8hvw1ezc2ad1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivlxtnkx8hvw1ezc2ad1.png" alt="AI researching and answering" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ✨ Why RAG Matters
&lt;/h2&gt;

&lt;p&gt;This approach solves some big problems with AI:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Up-to-date knowledge&lt;/strong&gt; – It can pull in the latest information, instead of relying only on what it “remembers.”
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better accuracy&lt;/strong&gt; – By using real sources, it reduces those “hallucinations” where AI just makes stuff up.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customizable&lt;/strong&gt; – You can feed it your own data (like company manuals or research papers), and it will actually use them to answer questions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi86f19xxpu1uf0y93a49.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi86f19xxpu1uf0y93a49.png" alt="Before/After AI comparison" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠 How RAG Works (Simple Version)
&lt;/h2&gt;

&lt;p&gt;Here’s the process in three steps:  &lt;/p&gt;

&lt;p&gt;1️⃣ &lt;strong&gt;You ask a question.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
2️⃣ &lt;strong&gt;The AI searches&lt;/strong&gt; through a set of documents for the most relevant pieces of information.&lt;br&gt;&lt;br&gt;
3️⃣ &lt;strong&gt;It writes a clear answer&lt;/strong&gt; using what it just found.  &lt;/p&gt;

&lt;p&gt;In short:  &lt;/p&gt;

&lt;p&gt;💡 &lt;strong&gt;Search + Smart Writing = RAG&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8ktar3z19u1shah2p0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8ktar3z19u1shah2p0p.png" alt="RAG process infographic" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🌍 Real-World Examples
&lt;/h2&gt;

&lt;p&gt;You’ve probably already seen RAG in action:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💬 Customer support chatbots that know about your account and can answer detailed questions.
&lt;/li&gt;
&lt;li&gt;📚 Research tools that summarize recent studies for you.
&lt;/li&gt;
&lt;li&gt;🏢 Internal company assistants that help employees find policies or technical documentation instantly.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;RAG makes AI &lt;strong&gt;smarter, more reliable and more helpful&lt;/strong&gt; by letting it look things up before answering.  &lt;/p&gt;

&lt;p&gt;So next time you hear someone talk about “RAG,” you can confidently say:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“It’s when AI searches for relevant info first, then writes a better answer, like having Google and ChatGPT work together.”&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>learning</category>
      <category>machinelearning</category>
      <category>rag</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Understanding Classification in Supervised Learning</title>
      <dc:creator>Naomi Jepkorir</dc:creator>
      <pubDate>Thu, 28 Aug 2025 06:08:31 +0000</pubDate>
      <link>https://dev.to/datawithnaomi/understanding-classification-in-supervised-learning-1h7e</link>
      <guid>https://dev.to/datawithnaomi/understanding-classification-in-supervised-learning-1h7e</guid>
      <description>&lt;p&gt;Machine learning is everywhere today, from Netflix recommendations  to fraud detection .&lt;br&gt;&lt;br&gt;
One of the most important techniques behind these systems is &lt;strong&gt;supervised learning&lt;/strong&gt;, and within that, &lt;strong&gt;classification&lt;/strong&gt; shines as one of the most practical approaches.  &lt;/p&gt;

&lt;p&gt;In this article, I’ll break down:&lt;br&gt;&lt;br&gt;
✨ What supervised learning is&lt;br&gt;&lt;br&gt;
✨ How classification works&lt;br&gt;&lt;br&gt;
✨ Common models for classification&lt;br&gt;&lt;br&gt;
✨ My personal views and insights&lt;br&gt;&lt;br&gt;
✨ Challenges I’ve faced along the way  &lt;/p&gt;




&lt;h2&gt;
  
  
  📘 What is Supervised Learning?
&lt;/h2&gt;

&lt;p&gt;Supervised learning is a type of machine learning where the model is trained on a &lt;strong&gt;labeled dataset&lt;/strong&gt;.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inputs (features):&lt;/strong&gt; The data we feed into the model.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outputs (labels):&lt;/strong&gt; The known answers we want the model to predict.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like teaching a student with flashcards: you show the input (a picture of a cat) and the correct label (“cat”). After enough examples, the student (our model) learns to generalize and can correctly label new, unseen inputs.  &lt;/p&gt;

&lt;p&gt;Supervised learning has two main branches:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Regression&lt;/strong&gt; – Predicting continuous values (e.g., house prices).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classification&lt;/strong&gt; – Predicting categories (e.g., spam vs. not spam).
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here, we’ll focus on &lt;strong&gt;classification&lt;/strong&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🏷️ How Classification Works
&lt;/h2&gt;

&lt;p&gt;Classification is all about sorting data into categories. Some everyday examples:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Email: &lt;em&gt;spam&lt;/em&gt; or &lt;em&gt;not spam&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Medical scan: &lt;em&gt;benign&lt;/em&gt; or &lt;em&gt;malignant&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Handwritten digit: &lt;em&gt;0–9&lt;/em&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The process usually looks like this:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Collect labeled data&lt;/strong&gt; 🗂️ &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract features&lt;/strong&gt; 🔎
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Train the model&lt;/strong&gt; 🤖
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test/validate&lt;/strong&gt; 📊
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make predictions&lt;/strong&gt; ✅
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At its heart, classification is about drawing boundaries between groups,some models literally draw a line, while others compare similarities like a “nearest neighbor.”  &lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Models Used for Classification
&lt;/h2&gt;

&lt;p&gt;There’s no one-size-fits-all solution. Here are some popular models:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logistic Regression 📉&lt;/strong&gt; – Despite its name, it’s a classification model. Predicts probabilities and assigns labels.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Trees 🌲&lt;/strong&gt; – Splits data by asking “yes/no” questions.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Random Forests 🌲🌲🌲&lt;/strong&gt; – A team of decision trees that vote together.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support Vector Machines (SVMs)&lt;/strong&gt; – Finds the best dividing line (or hyperplane).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;k-Nearest Neighbors (k-NN)&lt;/strong&gt; – Looks at the neighbors and goes with the majority.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neural Networks 🧠⚡&lt;/strong&gt; – Powerful for images, text and speech, though often harder to interpret. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each comes with trade-offs, some are simple and easy to explain, others are powerful but feel like a black box.  &lt;/p&gt;




&lt;h2&gt;
  
  
  💡 My Personal Views and Insights
&lt;/h2&gt;

&lt;p&gt;Over time, I learned:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data quality matters more than the model&lt;/strong&gt; . If the data is messy or biased, results will be too.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature engineering is underrated&lt;/strong&gt;. A simple model with great features can beat a complex one with poor inputs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy isn’t everything&lt;/strong&gt; . In real-world cases, metrics like precision, recall and F1-score often matter more, especially when classes are imbalanced.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚧 Challenges I’ve Faced
&lt;/h2&gt;

&lt;p&gt;Here are some hurdles I’ve personally run into:   &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Overfitting&lt;/strong&gt; – When the model memorizes the training data but fails on new inputs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature selection&lt;/strong&gt; – Choosing the right features is tricky: too many = noise, too few = missed signals.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Class imbalance&lt;/strong&gt;– Sometimes one class dominates the dataset, making it harder for the model to detect the minority class.(e.g., detecting fraud when only 1% of transactions are fraudulent).&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Classification is one of the most practical parts of supervised learning. From filtering spam to diagnosing diseases, it’s everywhere .  &lt;/p&gt;

&lt;p&gt;For me, working with classification has been both challenging and rewarding. The key lessons?  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Good data beats fancy models.
&lt;/li&gt;
&lt;li&gt;Evaluation metrics must match the real-world problem.
&lt;/li&gt;
&lt;li&gt;Interpretability matters, especially in sensitive applications.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite the hurdles, classification continues to be one of the most impactful tools in machine learning .  &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>learning</category>
    </item>
    <item>
      <title>⚖️ Choosing Between Type I and Type II Errors</title>
      <dc:creator>Naomi Jepkorir</dc:creator>
      <pubDate>Mon, 11 Aug 2025 20:56:55 +0000</pubDate>
      <link>https://dev.to/datawithnaomi/choosing-between-type-i-and-type-ii-errors-58nk</link>
      <guid>https://dev.to/datawithnaomi/choosing-between-type-i-and-type-ii-errors-58nk</guid>
      <description>&lt;p&gt;In statistics, making a decision is a bit like crossing a busy street without traffic lights, you have to weigh the risk of moving too soon against the risk of waiting too long. In hypothesis testing, those two risks are called Type I and Type II errors, and you can’t avoid them both entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  🕵️ Meet the Errors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Type I Error (False Positive) – Rejecting the null hypothesis when it’s actually true. In medicine, this might mean diagnosing a patient with a disease they don’t have.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Type II Error (False Negative) – Failing to reject the null hypothesis when it’s false. In medicine, this might mean missing a diagnosis when the disease is present.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They’re like opposite sides of a see-saw ⚖️ , lowering one usually raises the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  🦟 The Malaria Example
&lt;/h2&gt;

&lt;p&gt;Picture yourself in a clinic in a malaria-endemic region. A patient walks in with fever, chills and body aches. You suspect malaria, and you have a rapid test 🧪.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If you make a Type I error , you say they have malaria when they don’t. They take unnecessary medicine , maybe get mild side effects, and the real cause of illness is missed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you make a Type II error, you say they don’t have malaria when they do. Without treatment, the disease can worsen quickly, and in severe cases, become life-threatening .&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🔄 The Trade-off
&lt;/h2&gt;

&lt;p&gt;In this setting, Type II errors are generally more dangerous 🚨. Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Malaria progresses fast, especially in children and pregnant women.&lt;/li&gt;
&lt;li&gt;Anti-malarial treatment is relatively safe and inexpensive .&lt;/li&gt;
&lt;li&gt;Missing a real case can have far worse consequences than treating a false one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why some clinics treat suspected malaria even when the test is negative but symptoms are strong, better to risk a false positive than lose a life ❤️.&lt;/p&gt;

&lt;h2&gt;
  
  
  🖥️ Simulating Type I and Type II Errors in Python
&lt;/h2&gt;

&lt;p&gt;Here’s a small simulation showing how false positives and false negatives might look in malaria testing&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Seed for reproducibility
&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Number of patients
&lt;/span&gt;&lt;span class="n"&gt;n_patients&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;

&lt;span class="c1"&gt;# True malaria status (1 = has malaria, 0 = no malaria)
&lt;/span&gt;&lt;span class="n"&gt;true_malaria&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;binomial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_patients&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 30% prevalence
&lt;/span&gt;
&lt;span class="c1"&gt;# Test characteristics
&lt;/span&gt;&lt;span class="n"&gt;sensitivity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;  &lt;span class="c1"&gt;# correctly detect malaria (reduces Type II errors)
&lt;/span&gt;&lt;span class="n"&gt;specificity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.90&lt;/span&gt;  &lt;span class="c1"&gt;# correctly detect no malaria (reduces Type I errors)
&lt;/span&gt;
&lt;span class="c1"&gt;# Simulated test outcomes
&lt;/span&gt;&lt;span class="n"&gt;test_positive&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;true_malaria&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# True malaria case: test is positive with probability = sensitivity
&lt;/span&gt;        &lt;span class="n"&gt;test_positive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;sensitivity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# No malaria: false positive occurs with probability = (1 - specificity)
&lt;/span&gt;        &lt;span class="n"&gt;test_positive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;specificity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Convert to NumPy array
&lt;/span&gt;&lt;span class="n"&gt;test_positive&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_positive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Count errors
&lt;/span&gt;&lt;span class="n"&gt;type_I_errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;test_positive&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;true_malaria&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;type_II_errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;test_positive&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;true_malaria&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Type I errors (False Positives): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;type_I_errors&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Type II errors (False Negatives): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;type_II_errors&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Type I errors (False Positives): 73
Type II errors (False Negatives): 18
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How to use it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Change the &lt;code&gt;sensitivity&lt;/code&gt; to see what happens when you try to catch every malaria case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Change the &lt;code&gt;specificity&lt;/code&gt; to see what happens when you try to avoid false alarms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Notice how improving one tends to worsen the other, the eternal trade-off.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📊 The Balancing Act
&lt;/h2&gt;

&lt;p&gt;The choice between minimizing Type I or Type II errors depends on the context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;When the cost of a false positive is high (e.g., invasive surgery , expensive drugs ), reduce Type I errors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the cost of a false negative is high (e.g., fast-progressing diseases ), reduce Type II errors.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can’t eliminate both, so you set your alpha (Type I error rate) and power (linked to Type II error rate) based on the stakes .&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 Final Thought
&lt;/h2&gt;

&lt;p&gt;Choosing between Type I and Type II errors isn’t about perfection,it’s about priorities. In malaria diagnosis, the priority is saving lives , even if it means some people take medicine they don’t actually need.&lt;/p&gt;

&lt;p&gt;In every field, the key question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Which mistake can we live with, and which can we not afford to make?” 🤔&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>datascience</category>
      <category>statistics</category>
      <category>python</category>
      <category>learning</category>
    </item>
    <item>
      <title>⚽ Can We Predict the Next Premier League Champion with Binomial Probability?</title>
      <dc:creator>Naomi Jepkorir</dc:creator>
      <pubDate>Wed, 06 Aug 2025 12:21:06 +0000</pubDate>
      <link>https://dev.to/datawithnaomi/can-we-predict-the-next-premier-league-champion-with-binomial-probability-1gjg</link>
      <guid>https://dev.to/datawithnaomi/can-we-predict-the-next-premier-league-champion-with-binomial-probability-1gjg</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;What are the chances your favorite EPL team wins the league next season? Time to let math do the talking!&lt;/em&gt; 🎲&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🧠 Idea Behind the Madness
&lt;/h2&gt;

&lt;p&gt;Every football fan has asked it:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;"Can my team win the league next season?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of relying on blind hope, I decided to use &lt;strong&gt;binomial probability&lt;/strong&gt; to calculate each team's chances of taking the crown in the next Premier League season, based entirely on how they performed last time.&lt;/p&gt;

&lt;p&gt;We’ll:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fetch last season’s final standings from an API.&lt;/li&gt;
&lt;li&gt;Use binomial distribution to simulate two things:

&lt;ul&gt;
&lt;li&gt;The probability of a team repeating its exact win total.&lt;/li&gt;
&lt;li&gt;The probability of a team reaching the typical &lt;strong&gt;championship threshold&lt;/strong&gt; which is ≈27.6 so &lt;strong&gt;28 wins&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Rank them accordingly.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  🛰️ Step 1: Fetching EPL Data Using an API
&lt;/h2&gt;

&lt;p&gt;I used the &lt;a href="https://www.football-data.org/" rel="noopener noreferrer"&gt;football-data.org&lt;/a&gt; API to pull the standings. You’ll need a free API token, save it in a &lt;code&gt;.env&lt;/code&gt; file like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;API_TOKEN&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;your_football_data_token&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now fetch the standings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_epl_standings&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_TOKEN not found in env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://api.football-data.org/v4/competitions/PL/standings?season=2024&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;X-Auth-Token&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API request failed with status code &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;standings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_epl_standings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Convert to DataFrame:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;data_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;standings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data_rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;position&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Matches&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;playedGames&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Wins&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;won&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draw&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Losses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Points&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;points&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+/-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goalDifference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goalsFor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goalsAgainst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epl_standings.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🎯 Step 2: Binomial Probability of Exact Win Count
&lt;/h2&gt;

&lt;p&gt;Now let's calculate the probability of each team repeating the exact number of wins they had last season.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="c1"&gt;# Loop through each row and calculate binomial probability
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Team&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Matches&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# total games
&lt;/span&gt;    &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Wins&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;     &lt;span class="c1"&gt;# wins
&lt;/span&gt;    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;                &lt;span class="c1"&gt;# estimated win probability
&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;binom_prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;OverflowError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;binom_prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: P( &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; wins)  = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;binom_prob&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nt"&gt;Liverpool&lt;/span&gt; &lt;span class="nt"&gt;FC&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nt"&gt;P&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt; &lt;span class="err"&gt;25&lt;/span&gt; &lt;span class="nt"&gt;wins&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="err"&gt;135388&lt;/span&gt;
&lt;span class="nt"&gt;Arsenal&lt;/span&gt; &lt;span class="nt"&gt;FC&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nt"&gt;P&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt; &lt;span class="err"&gt;20&lt;/span&gt; &lt;span class="nt"&gt;wins&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="err"&gt;128761&lt;/span&gt;
&lt;span class="nt"&gt;Ipswich&lt;/span&gt; &lt;span class="nt"&gt;Town&lt;/span&gt; &lt;span class="nt"&gt;FC&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nt"&gt;P&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt; &lt;span class="err"&gt;4&lt;/span&gt; &lt;span class="nt"&gt;wins&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="err"&gt;206486&lt;/span&gt;
&lt;span class="nt"&gt;Southampton&lt;/span&gt; &lt;span class="nt"&gt;FC&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nt"&gt;P&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt; &lt;span class="err"&gt;2&lt;/span&gt; &lt;span class="nt"&gt;wins&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="err"&gt;278054&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📉 &lt;strong&gt;What These Results Tell Us&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Top teams like Liverpool have lower exact probabilities, there's more room for variation when you're near the top.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lower-table teams tend to have higher repeat chances, but don't celebrate just yet...&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🏆 Step 3: Probability of Title-Winning Season (≥ 28 Wins)
&lt;/h2&gt;

&lt;p&gt;Next, we model the probability of each team reaching 28 or more wins, a common threshold to win the league.&lt;/p&gt;

&lt;p&gt;We'll use the cumulative binomial distribution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;binom&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;title_probability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;38&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wins&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;binom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Team&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;wins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Wins&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;title_probability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: P(Wins ≥ 28) = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prob&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample Output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;P(Wins ≥ 28)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool FC&lt;/td&gt;
&lt;td&gt;19.78%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City FC&lt;/td&gt;
&lt;td&gt;1.54%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal FC&lt;/td&gt;
&lt;td&gt;0.66%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea FC&lt;/td&gt;
&lt;td&gt;0.66%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle United&lt;/td&gt;
&lt;td&gt;0.66%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Manchester United FC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.00%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  📊 Interpretation
&lt;/h2&gt;

&lt;p&gt;Liverpool is most likely to hit 28+ wins based on current form.&lt;/p&gt;

&lt;p&gt;City, Chelsea and the others trail behind, possibly due to more draws or inconsistent performances.&lt;/p&gt;

&lt;p&gt;Man United? Their chance rounds to zero. Ouch 😬.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🫣 United fans, this model says your 11-win season gives you a statistically negligible shot at the title. You might want to pray harder than you code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  ⚠️ Limitations
&lt;/h2&gt;

&lt;p&gt;Let’s be honest, binomial probability isn’t a crystal ball. Here's why:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It ignores real-world dynamics: transfers, injuries, managerial changes.&lt;/li&gt;
&lt;li&gt;It assumes independent, identically distributed matches (which football is not).&lt;/li&gt;
&lt;li&gt;Based on one season, not a large enough sample for deep insight.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But hey, it’s fun and statistically grounded!&lt;/p&gt;

&lt;h2&gt;
  
  
  🧪 Want to Take This Further?
&lt;/h2&gt;

&lt;p&gt;Here’s how you can level up the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use Poisson regression to simulate goals per match.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Integrate Elo ratings or other power metrics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run full Monte Carlo simulations of future fixtures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Track the model live across the season for dynamic probabilities.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  💭 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;While this model won’t help you win your fantasy league, it does give a math-driven glimpse into who’s statistically positioned to succeed. Liverpool fans? You have reason to dream. Southampton? Maybe next year...&lt;/p&gt;

&lt;p&gt;Football is unpredictable, and that's what makes it beautiful. But every now and then, it's fun to let the math have a shot at calling the game. ⚽📊&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>api</category>
      <category>math</category>
    </item>
    <item>
      <title>Understanding Measures of Central Tendency in Data Science</title>
      <dc:creator>Naomi Jepkorir</dc:creator>
      <pubDate>Sun, 20 Jul 2025 21:11:12 +0000</pubDate>
      <link>https://dev.to/datawithnaomi/understanding-measures-of-central-tendency-in-data-science-12hc</link>
      <guid>https://dev.to/datawithnaomi/understanding-measures-of-central-tendency-in-data-science-12hc</guid>
      <description>&lt;p&gt;When you think of "mean", "median" or "mode", chances are your brain flashes back to a math class you didn’t think you'd ever use again. 😅&lt;/p&gt;

&lt;p&gt;But here I am ,knee-deep in datasets, and those three little words keep showing up. Not just as formulas, but as powerful tools that help tell the &lt;em&gt;story&lt;/em&gt; behind the numbers.&lt;/p&gt;

&lt;p&gt;This post is part of my continued journey into data science. After exploring tools like Excel,power Bi I started digging into core concepts - and &lt;strong&gt;measures of central tendency&lt;/strong&gt; are some of the first I’ve truly appreciated in the real world.&lt;/p&gt;

&lt;p&gt;Let’s break it down in plain English 👇&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Measures of Central Tendency? 🤔
&lt;/h2&gt;

&lt;p&gt;Measures of central tendency help us understand the “center” or “typical” value in a dataset. Basically, they summarize what’s "normal" in your data, and that's a huge help when you’re making sense of hundreds (or millions) of numbers.&lt;/p&gt;

&lt;p&gt;The three most common ones are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mean&lt;/strong&gt; - the average value&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Median&lt;/strong&gt; - the middle value&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mode&lt;/strong&gt; - the most frequently occurring value&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;They each tell you something slightly different, and choosing the right one depends on the situation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Do They Matter in Data Science? 🎯
&lt;/h2&gt;

&lt;p&gt;When you're working with data, you're usually trying to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Understand trends&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compare groups&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make decisions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build predictive models&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Measures of central tendency give you a quick pulse check on your dataset. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If you’re analyzing income data, the median might be better than the mean because of outliers (like billionaires).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you're reviewing customer ratings from 1 to 5 stars, the mode could show you the most common sentiment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If your data is pretty clean and normally distributed, the mean gives a solid summary.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Examples 🔍
&lt;/h2&gt;

&lt;p&gt;Here are a few situations where these measures pop up:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;📈 Business Reporting&lt;br&gt;
Companies use the mean to summarize average sales, costs or customer satisfaction scores over time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🏥 Healthcare&lt;br&gt;
Hospitals might use the median to report wait times, since a few extreme cases can skew the average.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🛍️ Retail and Marketing&lt;br&gt;
The mode helps track the most popular product sizes, colors or price points.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  A Quick Python Example 🐍
&lt;/h3&gt;

&lt;p&gt;If you’ve got a list of numbers, you can calculate all three super easily:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# 3.44
&lt;/span&gt;&lt;span class="n"&gt;median&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# 4
&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# 4
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These tiny lines of code can give you a huge amount of insight.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Reflection 💭
&lt;/h2&gt;

&lt;p&gt;At first, I thought central tendency was just for passing stats exams. Now, I see it as one of the first things you should check when exploring a new dataset. It gives you a quick overview, helps spot data issues and sets the stage for deeper analysis or modeling.&lt;/p&gt;

&lt;p&gt;Plus, it’s foundational. Whether you're in Excel, Python or SQL, you'll use these concepts everywhere.&lt;/p&gt;

&lt;p&gt;If you're just getting started in data science like I am, don't overlook the basics. They’re called “central” for a reason. 😉&lt;/p&gt;

</description>
      <category>data</category>
      <category>datascience</category>
      <category>statistics</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How Excel is Used in Real-World Data Analysis</title>
      <dc:creator>Naomi Jepkorir</dc:creator>
      <pubDate>Wed, 11 Jun 2025 12:04:58 +0000</pubDate>
      <link>https://dev.to/datawithnaomi/how-excel-is-used-in-real-world-data-analysis-3hhp</link>
      <guid>https://dev.to/datawithnaomi/how-excel-is-used-in-real-world-data-analysis-3hhp</guid>
      <description>&lt;p&gt;When I started my journey in Data Science &amp;amp; Analytics, I knew Excel was a common tool in the workplace, but I didn’t realize just how powerful and versatile it really is. After just one week of learning Excel, I’ve already seen how it plays a major role in real-world data analysis and decision-making across many industries.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Excel? 🤔
&lt;/h2&gt;

&lt;p&gt;Microsoft Excel is a spreadsheet program that allows users to organize, analyze, and visualize data efficiently. It's widely used by professionals in fields like finance, marketing, operations and beyond. While it may seem simple at first glance, Excel offers a rich set of features that make it a go-to tool for data analysts around the world.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Uses of Excel in Data Analysis 🔍
&lt;/h2&gt;

&lt;p&gt;Here are just a few examples of how Excel is used in real-world data analysis:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Business Decision-Making&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Excel helps companies track performance metrics and make data-driven decisions. Dashboards built with Excel can show KPIs (Key Performance Indicators), trends and summaries that guide strategy and planning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Financial Reporting&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Financial analysts rely on Excel for budgeting, forecasting and generating reports. Excel’s formulas, templates and automation features reduce errors and save time on repetitive tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Marketing Performance Analysis&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Marketing teams use Excel to analyze campaign data, track conversions, segment audiences and measure ROI (Return on Investment). With features like pivot tables and filters, they can drill down into specific data segments easily.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Powerful Excel Features That Make Data Analysis Easy ⚙️
&lt;/h2&gt;

&lt;p&gt;In just a week, I've learned a few advanced Excel features and formulas that really opened my eyes to what’s possible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;VLOOKUP()&lt;/code&gt; and &lt;code&gt;XLOOKUP()&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
These functions help find and connect data across large datasets. Whether matching IDs to names or merging data from multiple sources, they simplify complex lookups.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Validation&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This feature helps ensure clean, consistent data entry. For example, limiting entries in a column to a specific list (like “Low,” “Medium,” “High”) helps prevent typos and standardizes the data for more accurate analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conditional Formatting&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This makes your data visually dynamic. You can highlight trends, outliers, or duplicates using color scales, icons, or rules. It’s especially helpful when trying to quickly identify which values stand out in a dataset which is great for spotting trends or anomalies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Filters and Slicers&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Filters help focus on specific data without deleting anything. When paired with pivot tables or tables, they allow for interactive exploration and quick insights—like segmenting sales by region or category.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My Reflection 💭
&lt;/h2&gt;

&lt;p&gt;Learning Excel has changed the way I view data. Before, I saw spreadsheets as static and kind of boring — just tables of numbers. Now, I see them as dynamic tools for storytelling, insight and strategy. It’s amazing how much you can learn about a situation just by organizing the data correctly and applying the right formula. I’m excited to keep building my skills and see how Excel fits into more advanced analytics tools down the road.&lt;/p&gt;

</description>
      <category>data</category>
      <category>datascience</category>
      <category>beginners</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
