<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Loryne Joy Omwando</title>
    <description>The latest articles on DEV Community by Loryne Joy Omwando (@loryne_joy).</description>
    <link>https://dev.to/loryne_joy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3246937%2F32d37d55-f12c-499a-9a3d-43cb16cbf242.jpg</url>
      <title>DEV Community: Loryne Joy Omwando</title>
      <link>https://dev.to/loryne_joy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/loryne_joy"/>
    <language>en</language>
    <item>
      <title>RAG FOR DUMMIES</title>
      <dc:creator>Loryne Joy Omwando</dc:creator>
      <pubDate>Sun, 14 Sep 2025 17:34:01 +0000</pubDate>
      <link>https://dev.to/loryne_joy/rag-for-dummies-1bjj</link>
      <guid>https://dev.to/loryne_joy/rag-for-dummies-1bjj</guid>
      <description>&lt;p&gt;When I first heard the term &lt;strong&gt;RAG&lt;/strong&gt; (Retrieval-Augmented Generation), I honestly thought it was another one of those intimidating machine learning buzzwords that only AI researchers could understand. But after digging deeper, I realized RAG is actually a very practical concept—one that makes Large Language Models (LLMs) like GPT smarter, more accurate, and much more useful. If you’re new to AI, Machine Learning, or just curious about how modern AI systems answer questions so effectively, this article is for you.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is RAG?
&lt;/h2&gt;

&lt;p&gt;At its core, &lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; is a technique that combines two worlds:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt; → Searching for relevant information from a knowledge base or database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt; → Using a language model to create a human-like answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of expecting an LLM to "memorize" the entire internet during training, RAG gives it the ability to &lt;strong&gt;look things up&lt;/strong&gt; in real time, and then use that retrieved information to generate better answers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why is RAG Needed?
&lt;/h2&gt;

&lt;p&gt;LLMs are powerful, but they have two major limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge cutoff&lt;/strong&gt; → They can’t know anything beyond the data they were trained on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination&lt;/strong&gt; → They sometimes make up answers confidently, even when wrong.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG solves these issues by connecting the model to an &lt;strong&gt;external knowledge source&lt;/strong&gt; (like a vector database, Wikipedia, or your company’s documents). Instead of hallucinating, the model retrieves facts and then forms a response.&lt;/p&gt;

&lt;p&gt;Think of it like this: without RAG, an LLM is like a student trying to take an exam with no notes. With RAG, the student is allowed to bring reference books into the exam hall.&lt;/p&gt;




&lt;h2&gt;
  
  
  How RAG Works (Step by Step)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User asks a question&lt;/strong&gt; → e.g., &lt;em&gt;“What are the symptoms of diabetes?”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retriever fetches documents&lt;/strong&gt; → The system searches a knowledge base (medical docs, Wikipedia, etc.) and pulls relevant passages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generator creates an answer&lt;/strong&gt; → The LLM uses both the retrieved docs and its own language ability to craft a final response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This makes the answer both &lt;strong&gt;accurate&lt;/strong&gt; and &lt;strong&gt;well-written&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example in Python (Simplified)
&lt;/h2&gt;

&lt;p&gt;Here’s a minimal example using Hugging Face’s &lt;code&gt;transformers&lt;/code&gt; library with a RAG model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RagTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RagRetriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RagSequenceForGeneration&lt;/span&gt;

&lt;span class="c1"&gt;# Load model and tokenizer
&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RagTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;facebook/rag-token-base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RagRetriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;facebook/rag-token-base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exact&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RagSequenceForGeneration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;facebook/rag-token-base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Encode question
&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who is the president of Kenya in 2025?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Generate answer
&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batch_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes a question.&lt;/li&gt;
&lt;li&gt;Retrieves relevant docs from a database.&lt;/li&gt;
&lt;li&gt;Generates a natural language answer using the docs + the model.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where is RAG Used?
&lt;/h2&gt;

&lt;p&gt;RAG is not just theory—it’s already powering many real-world applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chatbots &amp;amp; Virtual Assistants&lt;/strong&gt; → They fetch accurate info from knowledge bases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer Support&lt;/strong&gt; → Agents use RAG systems to quickly answer FAQs from company docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Healthcare&lt;/strong&gt; → Doctors can query medical databases for up-to-date insights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Education&lt;/strong&gt; → Students can ask questions, and the system cites textbooks or research papers.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Benefits of RAG
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt; Keeps answers up to date&lt;/li&gt;
&lt;li&gt; Reduces hallucinations&lt;/li&gt;
&lt;li&gt; Can handle specialized knowledge (finance, healthcare, law)&lt;/li&gt;
&lt;li&gt; More efficient than training a massive LLM from scratch&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Challenges of RAG
&lt;/h2&gt;

&lt;p&gt;Of course, RAG is not perfect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Requires a well-organized knowledge base.&lt;/li&gt;
&lt;li&gt; Retrieval quality matters—a bad retriever means bad answers.&lt;/li&gt;
&lt;li&gt; More computationally expensive than using just a plain LLM.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;RAG is a game-changer. Instead of forcing AI models to know everything, we let them &lt;strong&gt;fetch knowledge as needed&lt;/strong&gt;. It’s like giving AI both memory (retrieval) and intelligence (generation). As someone currently learning Data Science and AI, I see RAG as one of the most practical bridges between machine learning theory and real-world applications.&lt;/p&gt;

&lt;p&gt;If you’re diving into AI, understanding RAG will definitely give you an edge—not only in technical projects but also in appreciating how modern AI systems are evolving.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
    <item>
      <title>Balancing Type I and Type II Errors in Medical Decisions: A Kenyan Perspective</title>
      <dc:creator>Loryne Joy Omwando</dc:creator>
      <pubDate>Fri, 29 Aug 2025 13:20:39 +0000</pubDate>
      <link>https://dev.to/loryne_joy/balancing-type-i-and-type-ii-errors-in-medical-decisions-a-kenyan-perspective-2adn</link>
      <guid>https://dev.to/loryne_joy/balancing-type-i-and-type-ii-errors-in-medical-decisions-a-kenyan-perspective-2adn</guid>
      <description>&lt;p&gt;When we study statistics, we often hear about Type 1 and Type 2 errors. But in real life, especially in medicine, these errors are not just theoretical—they can literally mean the difference between life and death. Understanding where to trade off these errors is crucial for doctors, public health policymakers, and even patients making informed decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Type 1 and Type 2 Errors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type 1 Error (False Positive)&lt;/strong&gt;: This occurs when we conclude that something is true when it is actually false. In medical terms, it’s diagnosing a patient with a disease they don’t have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type 2 Error (False Negative)&lt;/strong&gt;: This happens when we fail to detect something that is true. Medically, it’s missing a diagnosis for a patient who actually has the disease.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-off between these errors often feels like a tightrope walk. Reducing one can increase the other, and vice versa. So, how do we make this decision?&lt;/p&gt;

&lt;h2&gt;
  
  
  A Medical Scenario in Kenya: Malaria Testing
&lt;/h2&gt;

&lt;p&gt;Imagine you are a clinician in Kisumu, where malaria is prevalent. You have a diagnostic test for malaria with 95% accuracy. Now, consider the implications of each error type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type 1 Error (False Positive)&lt;/strong&gt;: You diagnose malaria in someone who doesn’t have it. The patient might receive unnecessary antimalarial drugs, which can lead to side effects and contribute to drug resistance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type 2 Error (False Negative)&lt;/strong&gt;: You miss malaria in a patient who actually has it. This patient may not receive treatment in time, leading to severe complications or even death.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clearly, in this scenario, &lt;strong&gt;Type 2 errors are more dangerous&lt;/strong&gt;. Therefore, it’s safer to accept a slightly higher rate of Type 1 errors (false positives) to minimize Type 2 errors (false negatives).&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing the Trade-Off in Python
&lt;/h2&gt;

&lt;p&gt;We can simulate this trade-off using Python. Let’s assume we adjust the threshold of a diagnostic test and observe how Type 1 and Type 2 errors change.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="c1"&gt;# Simulated probabilities of disease
&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;true_disease&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;binomial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 10% prevalence
&lt;/span&gt;
&lt;span class="c1"&gt;# Test sensitivity threshold
&lt;/span&gt;&lt;span class="n"&gt;thresholds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;false_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;false_negatives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
    &lt;span class="n"&gt;fp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;true_disease&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;true_disease&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;true_disease&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;true_disease&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;false_positives&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;false_negatives&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;false_positives&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Type 1 Error (FP)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;false_negatives&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Type 2 Error (FN)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Decision Threshold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Error Rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Trade-Off Between Type 1 and Type 2 Errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the plot, we can visually pick a threshold where Type 2 errors are minimized, even if Type 1 errors increase slightly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Decisions
&lt;/h2&gt;

&lt;p&gt;The trade-off between Type 1 and Type 2 errors depends on &lt;strong&gt;context and consequences&lt;/strong&gt;. In medical diagnostics, the severity of missing a disease often outweighs the inconvenience of a false alarm. In our malaria example, it is reasonable to tolerate some false positives to avoid missing actual malaria cases.&lt;/p&gt;

&lt;p&gt;In other contexts, such as a drug side effect study, you might want to minimize Type 1 errors to prevent falsely claiming a drug is harmful when it isn’t. The key is to carefully weigh the &lt;strong&gt;risks and consequences&lt;/strong&gt; before deciding on the acceptable balance.&lt;/p&gt;




&lt;p&gt;Understanding Type 1 and Type 2 errors is not just an academic exercise. It’s a vital part of making informed decisions, especially in healthcare. In Kenya, where resources and access to medical care vary, making the right trade-off can save lives.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Understanding Supervised Learning: A Deep Dive into Classification</title>
      <dc:creator>Loryne Joy Omwando</dc:creator>
      <pubDate>Fri, 22 Aug 2025 06:21:29 +0000</pubDate>
      <link>https://dev.to/loryne_joy/-understanding-supervised-learning-a-deep-dive-into-classification-31de</link>
      <guid>https://dev.to/loryne_joy/-understanding-supervised-learning-a-deep-dive-into-classification-31de</guid>
      <description>&lt;p&gt;Machine Learning has always sounded like something challenging and technical . But the more I study it, the more I realize it’s simply about teaching computers to &lt;em&gt;learn from data&lt;/em&gt;. One of the most important branches of Machine Learning I’ve been exploring lately is &lt;strong&gt;Supervised Learning&lt;/strong&gt;—and in this post, I want to focus specifically on &lt;strong&gt;classification&lt;/strong&gt;, sharing what I’ve learned so far, the models I’ve used, and some of the challenges I’ve faced as a student diving into this fascinating world.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Supervised Learning?
&lt;/h2&gt;

&lt;p&gt;As a former teacher, I have realised that supervised Learning is like teaching a child using flashcards. You show them an apple, tell them “this is an apple,” and do the same with oranges, bananas, and so on. Over time, they start recognizing fruits on their own.&lt;/p&gt;

&lt;p&gt;In the same way, supervised learning uses &lt;strong&gt;labeled data&lt;/strong&gt;—meaning the input data already comes with the correct answers (labels). The algorithm studies this relationship and later predicts labels for unseen data.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If we feed a model patient data (like age, blood pressure, sugar levels) &lt;strong&gt;with labels&lt;/strong&gt; (“diabetic” or “not diabetic”), the model learns to classify new patients into these categories.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How Classification Works
&lt;/h2&gt;

&lt;p&gt;Classification is all about &lt;strong&gt;sorting things into groups&lt;/strong&gt;. The data has features (inputs), and the task is to predict which class (output) each data point belongs to.&lt;/p&gt;

&lt;p&gt;Here’s the step-by-step way I think about it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Collect and label data&lt;/strong&gt; – You need a dataset where the right answers (classes) are already known.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Train the model&lt;/strong&gt; – Feed this data to an algorithm so it can learn the relationship between features and labels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the model&lt;/strong&gt; – Check how well it predicts on unseen data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy&lt;/strong&gt; – Use it to make real-world decisions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simple example from daily life: Gmail classifying emails into &lt;strong&gt;Spam&lt;/strong&gt; or &lt;strong&gt;Not Spam&lt;/strong&gt;. That’s binary classification. More complex tasks, like classifying animals into cats, dogs, or birds, are called &lt;strong&gt;multi-class classification&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;Types of Classification&lt;/p&gt;

&lt;p&gt;When we first started learning about classification in class, I found it really helpful to understand that classification itself has different types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt; – Only two classes (e.g., spam vs not spam).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-Class Classification&lt;/strong&gt; – More than two classes, but each data point belongs to just one (e.g. classifying fruits into apple, banana, or mango).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-Label Classification&lt;/strong&gt; – Each data point can belong to multiple categories at once (e.g. tagging a photo as “beach,” “sunset” and “friends”).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these types cleared up a lot of confusion for me when I was getting started!&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Models Used for Classification
&lt;/h2&gt;

&lt;p&gt;While exploring classification, I came across several algorithms, each with its own strengths and weaknesses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logistic Regression&lt;/strong&gt; – Despite its name, it’s actually used for classification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Trees&lt;/strong&gt; – Easy to understand and interprete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Random Forests&lt;/strong&gt; – A collection of decision trees working together .&lt;/li&gt;
&lt;li&gt;**Support Vector Machines (SVMs) – Great at finding boundaries, but can be heavy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;k-Nearest Neighbors (kNN)&lt;/strong&gt; – Looks at “neighbors” to decide the class.&lt;/li&gt;
&lt;li&gt;**Neural Networks – Super powerful for complex tasks like image and speech.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💻 A Simple Python Example
&lt;/h2&gt;

&lt;p&gt;When I was first learning classification, writing the code felt overwhelming. But once I discovered &lt;strong&gt;scikit-learn&lt;/strong&gt;, things clicked. Here’s a simple example using Logistic Regression on the Iris dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_iris&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;

&lt;span class="c1"&gt;# Load dataset
&lt;/span&gt;&lt;span class="n"&gt;iris&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_iris&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iris&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iris&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;

&lt;span class="c1"&gt;# Split into train/test
&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Train model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Predict &amp;amp; evaluate
&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accuracy:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seeing the accuracy printed out for the first time was a fulfilling moment for me —the computer had actually “learned” from data!&lt;/p&gt;




&lt;h2&gt;
  
  
  My Personal Insights
&lt;/h2&gt;

&lt;p&gt;As a student in Machine Learning, I find classification exciting because it’s so close to real life. Every day, we make classifications in our minds—deciding if a matatu is too full , whether a mango is ripe, or even whether it will rain judging from the sky.&lt;/p&gt;

&lt;p&gt;One big lesson: &lt;strong&gt;no algorithm is a silver platter&lt;/strong&gt;. Sometimes a simple logistic regression beats a fancy neural network, depending on the dataset.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges I’ve Faced
&lt;/h2&gt;

&lt;p&gt;It hasn’t been smooth sailing. Here are some of the struggles I’ve encountered while working with classification:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Understanding the Types of Supervised Learning&lt;/strong&gt;: Differentiating between classification and regression was tough at first. Writing their Python code also felt intimidating because I didn’t know which libraries to use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Quality&lt;/strong&gt;: Missing values, duplicates, or wrong labels can ruin everything .&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overfitting&lt;/strong&gt;: My decision trees once performed perfectly on training data but terribly on test data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Computational Resources&lt;/strong&gt;: Neural networks are amazing, but without a good GPU, they can be painfully slow, thus I have to use google colab.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Supervised learning, and classification in particular, has given me a new appreciation of how data can drive intelligent decisions. From simple logistic regression to powerful neural networks, the journey of trying, failing, debugging, and improving has taught me not just technical skills but also patience.&lt;/p&gt;

&lt;p&gt;I’m still learning, but one thing is clear: &lt;strong&gt;classification is not just about algorithms—it’s about asking the right questions, preparing the right data, and interpreting results responsibly.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;**I’d love to hear your thoughts in the comments under this article!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>⚽ Predicting 2024/25 Premier League Win Probabilities Using Python</title>
      <dc:creator>Loryne Joy Omwando</dc:creator>
      <pubDate>Wed, 30 Jul 2025 21:56:36 +0000</pubDate>
      <link>https://dev.to/loryne_joy/predicting-202425-premier-league-win-probabilities-using-python-19a7</link>
      <guid>https://dev.to/loryne_joy/predicting-202425-premier-league-win-probabilities-using-python-19a7</guid>
      <description>&lt;p&gt;In this project, I explored how to &lt;strong&gt;predict the probability of Premier League teams winning games in the 2024/25 season&lt;/strong&gt;, using their 2023/24 results as a baseline. I used Python, the API-Football API, and some light statistics to model each team's win probability.&lt;/p&gt;

&lt;p&gt;Let’s break it down 👇&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 The Goal
&lt;/h2&gt;

&lt;p&gt;Predict how many games each team is likely to win in the 2024/25 season using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧮 &lt;strong&gt;Bernoulli Distribution&lt;/strong&gt; (win or no win)&lt;/li&gt;
&lt;li&gt;🎲 &lt;strong&gt;Binomial Probability Model&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;📈 Visualizations with Seaborn &amp;amp; Matplotlib&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚙️ Tools &amp;amp; Libraries
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;binom&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  📦 Step 1: Pull 2023/24 Match Data from API-Football
&lt;/h2&gt;

&lt;p&gt;I used the &lt;a href="https://dashboard.api-football.com/" rel="noopener noreferrer"&gt;API-Football&lt;/a&gt; service to get Premier League match data for the 2023/24 season:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_api_key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://v3.football.api-sports.io&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;HEADERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x-apisports-key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;league&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;39&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;season&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/fixtures&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;HEADERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔍 Step 2: Process the Results
&lt;/h2&gt;

&lt;p&gt;Each match was inspected to determine the winning team, and I counted how many matches each team played and won.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fixture&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;short&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;home_team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;teams&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;home&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;away_team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;teams&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;away&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;home_goals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;goals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;home&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;away_goals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;goals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;away&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;home_goals&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;away_goals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;winner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;away_goals&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;home_goals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;winner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;winner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# draw
&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;home&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;away&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;winner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;winner&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  📊 Step 3: Calculate Win Probabilities
&lt;/h2&gt;

&lt;p&gt;I grouped matches by team, calculated win rates (wins / games), and used the &lt;strong&gt;Binomial PMF&lt;/strong&gt; to estimate their chance of winning a given number of games in 38-match season.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;home&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;union&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;away&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])))&lt;/span&gt;
&lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;played&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;home&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;away&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="n"&gt;wins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;winner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;win_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wins&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;played&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wins&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;wins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;played&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;played&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;win_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;win_rate&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;df_stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  📈 Step 4: Visualize the Prediction
&lt;/h2&gt;

&lt;p&gt;I used Seaborn to create a line plot for each team showing the probability distribution of their possible wins in the next season (assuming 38 games).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;season_games&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;38&lt;/span&gt;
&lt;span class="n"&gt;plot_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;win_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;season_games&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;binom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;season_games&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;plot_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wins&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;probability&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prob&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;viz_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plot_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lineplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;viz_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wins&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;probability&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Predicted Win Probability Distribution (2024/25 Season)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Number of Wins&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Probability&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furf7yoopnzy071ug0sh0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furf7yoopnzy071ug0sh0.png" alt="Predicted Win Probability Visual" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🧠 Why This Matters
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;This approach doesn’t &lt;strong&gt;predict exact results&lt;/strong&gt;, but gives a solid &lt;strong&gt;probability profile&lt;/strong&gt; for each team.&lt;/li&gt;
&lt;li&gt;It’s helpful for &lt;strong&gt;analysts and fans&lt;/strong&gt; to understand team performance trends.&lt;/li&gt;
&lt;li&gt;One can improve this model by adding player-level data, home/away effects, injuries, or transfer impact.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💭 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This was a fun exploration that blended &lt;strong&gt;sports and data science&lt;/strong&gt;. Using historical data with probability theory gives  deeper insights than just "gut feeling."&lt;/p&gt;




&lt;p&gt;📂 &lt;strong&gt;GitHub&lt;/strong&gt;: [&lt;a href="https://github.com/loryneJoy/Python-Assignments.git" rel="noopener noreferrer"&gt;https://github.com/loryneJoy/Python-Assignments.git&lt;/a&gt;]&lt;br&gt;&lt;br&gt;
🐍 &lt;strong&gt;Tags&lt;/strong&gt;: &lt;code&gt;#football&lt;/code&gt; &lt;code&gt;#python&lt;/code&gt; &lt;code&gt;#data-science&lt;/code&gt; &lt;code&gt;#premier-league&lt;/code&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>datascience</category>
      <category>analytics</category>
      <category>python</category>
    </item>
    <item>
      <title>📊 The Measures of Central Tendency and Why They Matter in Data Science</title>
      <dc:creator>Loryne Joy Omwando</dc:creator>
      <pubDate>Sun, 20 Jul 2025 19:54:44 +0000</pubDate>
      <link>https://dev.to/loryne_joy/-the-measures-of-central-tendency-and-why-they-matter-in-data-science-1618</link>
      <guid>https://dev.to/loryne_joy/-the-measures-of-central-tendency-and-why-they-matter-in-data-science-1618</guid>
      <description>&lt;p&gt;Have you ever stared at a dataset and wondered: &lt;em&gt;“Okay... but what does all this really mean?”&lt;/em&gt;&lt;br&gt;
Welcome to the world of central tendency—your first step in summarizing data and making it &lt;em&gt;speak&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Whether you're a growing data analyst or a seasoned data scientist, understanding the core of yone's data starts here.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧠 What Are Measures of Central Tendency?
&lt;/h2&gt;

&lt;p&gt;In simple terms, &lt;strong&gt;measures of central tendency&lt;/strong&gt; help us find the &lt;em&gt;middle point&lt;/em&gt; or &lt;em&gt;typical value&lt;/em&gt; in a dataset. These measures include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mean&lt;/strong&gt; – the average&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Median&lt;/strong&gt; – the middle value&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mode&lt;/strong&gt; – the most frequent value&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of them like lenses: each one shows the data in a slightly different way.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧺 Why Are They Important in Data Science?
&lt;/h2&gt;

&lt;p&gt;Raw data can be messy, overwhelming, and misleading without context.&lt;/p&gt;

&lt;p&gt;When working with data, especially during &lt;strong&gt;exploratory data analysis (EDA)&lt;/strong&gt;, these measures help us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Summarize large datasets&lt;/strong&gt; with a single number&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detect outliers&lt;/strong&gt; and understand their impact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose appropriate models&lt;/strong&gt; (some ML algorithms assume normal distribution)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communicate insights&lt;/strong&gt; clearly to stakeholders who aren’t tech-savvy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here are some practical examples 👇&lt;/p&gt;


&lt;h2&gt;
  
  
  📌 The Mean – "The Classic Average"
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;salaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;40000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;45000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;52000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;mean_salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The average salary is: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mean_salary&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;💡 &lt;strong&gt;But beware!&lt;/strong&gt; The mean is &lt;em&gt;sensitive to outliers&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;What happens if we introduce a wildly high salary?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Big CEO bonus!
&lt;/span&gt;&lt;span class="n"&gt;mean_salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New average salary: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mean_salary&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The average gets pulled up, even though most employees earn much less.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 The Median – "The Middle Ground"
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;median_salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The median salary is: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;median_salary&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;median&lt;/strong&gt; resists outliers, making it a better choice when the data is skewed.&lt;/p&gt;

&lt;p&gt;👈 For example, in &lt;strong&gt;real estate prices&lt;/strong&gt;, &lt;strong&gt;income levels&lt;/strong&gt;, or &lt;strong&gt;housing rent&lt;/strong&gt;, the median gives a fairer picture.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 The Mode – "The Most Popular Kid"
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;

&lt;span class="n"&gt;grades&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;most_common_grade&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grades&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The most common grade is: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;most_common_grade&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;mode&lt;/strong&gt; is especially useful for &lt;strong&gt;categorical data&lt;/strong&gt;, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most purchased product&lt;/li&gt;
&lt;li&gt;Favorite programming language&lt;/li&gt;
&lt;li&gt;Most common diagnosis in a hospital dataset&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📉 When to Use Which?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Measure&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Avoid When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mean&lt;/td&gt;
&lt;td&gt;Symmetric distributions&lt;/td&gt;
&lt;td&gt;Data has outliers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median&lt;/td&gt;
&lt;td&gt;Skewed data or outliers&lt;/td&gt;
&lt;td&gt;Uniform distributions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mode&lt;/td&gt;
&lt;td&gt;Categorical data&lt;/td&gt;
&lt;td&gt;Continuous variables with few or no repeats&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🔍 Real-Life Use Case: House Prices
&lt;/h2&gt;

&lt;p&gt;Imagine you’re analyzing house prices in Nairobi:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;house_prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_200_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_300_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10_000_000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# 👀 big outlier!
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mean:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;house_prices&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Median:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;house_prices&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which one would you trust more to describe a "typical" house price?&lt;br&gt;
Definitely the &lt;strong&gt;median&lt;/strong&gt;—because that luxury mansion isn't your average listing.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧠 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Mastering central tendency is more than just memorizing formulas.&lt;/p&gt;

&lt;p&gt;It’s about knowing &lt;strong&gt;which tool to use&lt;/strong&gt;, &lt;strong&gt;when to use it&lt;/strong&gt;, and &lt;strong&gt;why&lt;/strong&gt;. Data Science isn't just about models and code—it's about &lt;strong&gt;context and communication&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So next if handed a CSV file full of numbers, don’t panic. It's important to:&lt;/p&gt;

&lt;p&gt;Start with the basics.&lt;br&gt;
Start with central tendency.&lt;/p&gt;


&lt;h2&gt;
  
  
  ✅ TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mean&lt;/strong&gt; = average (useful, but sensitive to outliers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Median&lt;/strong&gt; = middle value (great for skewed data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mode&lt;/strong&gt; = most frequent value (perfect for categories)&lt;/li&gt;
&lt;li&gt;Use them in &lt;strong&gt;EDA&lt;/strong&gt;, &lt;strong&gt;data summaries&lt;/strong&gt;, and to &lt;strong&gt;build intuition&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;Thanks for reading! 🙌&lt;br&gt;
If you found this helpful, let’s connect or discuss below:&lt;br&gt;
What’s your go-to measure when you explore new data?&lt;/p&gt;


&lt;h2&gt;
  
  
  🏰 Dev.to Metadata
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;data-science
python
statistics
beginners
eda
machine-learning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>How I Built an RCPA Prescription Performance Dashboard in Power BI</title>
      <dc:creator>Loryne Joy Omwando</dc:creator>
      <pubDate>Thu, 10 Jul 2025 14:43:40 +0000</pubDate>
      <link>https://dev.to/loryne_joy/how-i-built-an-rcpa-prescription-performance-dashboard-in-power-bi-2f23</link>
      <guid>https://dev.to/loryne_joy/how-i-built-an-rcpa-prescription-performance-dashboard-in-power-bi-2f23</guid>
      <description>&lt;p&gt;Recently, I completed a rewarding Power BI project that involved transforming raw &lt;strong&gt;Retail Chemist Prescription Audit (RCPA)&lt;/strong&gt; data into an interactive dashboard that provides deep business insights. The challenge wasn't just in visualizing the data, but in cleaning, transforming, modeling, and telling a data-driven story that stakeholders could act upon.&lt;/p&gt;

&lt;p&gt;In this article, I’ll walk you through &lt;strong&gt;how I tackled the project from start to finish&lt;/strong&gt;, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ETL in Power Query&lt;/li&gt;
&lt;li&gt;Data modeling and relationships&lt;/li&gt;
&lt;li&gt;Key DAX measures&lt;/li&gt;
&lt;li&gt;Designing visuals for insights&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🗂️ Project Overview
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Create a dynamic Power BI dashboard to analyze prescription performance by doctor, brand, region, and medical rep, and to understand &lt;strong&gt;doctor conversion&lt;/strong&gt; and &lt;strong&gt;brand competition&lt;/strong&gt; trends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Objectives:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean and transform raw RCPA data&lt;/li&gt;
&lt;li&gt;Build a structured data model with relationships&lt;/li&gt;
&lt;li&gt;Generate insightful visuals using DAX and Power BI visuals&lt;/li&gt;
&lt;li&gt;Help business users track brand performance and doctor behavior&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📦 Dataset Summary
&lt;/h2&gt;

&lt;p&gt;The dataset included four main tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RCPA Reporting Form:&lt;/strong&gt; Raw data on doctor prescriptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product Master:&lt;/strong&gt; Product and brand metadata&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brand Targets:&lt;/strong&gt; Expected prescription targets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expected Transformation Sheet:&lt;/strong&gt; Data transformation guide&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧼 Step 1: ETL with Power Query
&lt;/h2&gt;

&lt;p&gt;Using &lt;strong&gt;Power Query Editor&lt;/strong&gt;, I cleaned and transformed the raw datasets into analytics-ready tables:&lt;/p&gt;

&lt;h3&gt;
  
  
  🔹 Cleaning Tasks:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Removed duplicates and missing values&lt;/li&gt;
&lt;li&gt;Converted text-based numbers (e.g., "KSh 1,000") to numeric format&lt;/li&gt;
&lt;li&gt;Standardized column names and data types&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔹 Transformation Tasks:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Merged Product Master with RCPA Reporting Form to enrich product info&lt;/li&gt;
&lt;li&gt;Created &lt;strong&gt;RCPA Data Table&lt;/strong&gt; with relevant metrics (Brand, Doctor, Med Rep)&lt;/li&gt;
&lt;li&gt;Created &lt;strong&gt;Competitor RCPA Data Table&lt;/strong&gt; for competitor comparisons&lt;/li&gt;
&lt;li&gt;Aggregated prescription counts and values as needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This step ensured clean, structured data that could be used reliably in the data model and visuals.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Step 2: Building the Data Model
&lt;/h2&gt;

&lt;p&gt;I designed a &lt;strong&gt;star schema&lt;/strong&gt; where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fact tables:&lt;/strong&gt; RCPA Data and Competitor RCPA Data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dimension tables:&lt;/strong&gt; Product Master and Brand Targets&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔁 Relationships Created:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Product Master&lt;/code&gt; ➝ &lt;code&gt;RCPA Data&lt;/code&gt; (based on product/brand)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Brand Targets&lt;/code&gt; ➝ &lt;code&gt;RCPA Data&lt;/code&gt; (to compare actual vs. target Rx)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Product Master&lt;/code&gt; ➝ &lt;code&gt;Competitor RCPA Data&lt;/code&gt; (for brand competition)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All relationships were tested and configured with correct cardinality and filter directions.&lt;/p&gt;




&lt;h2&gt;
  
  
  📈 Step 3: Visualizing Insights
&lt;/h2&gt;

&lt;p&gt;With the model in place, I designed a clean and interactive dashboard in Power BI, which included:&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Visuals Built:
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Doctor Prescription (Rx) Performance&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Bar/Column charts to show prescription volume per doctor vs. brand targets&lt;/li&gt;
&lt;li&gt;Filterable by Region and Medical Rep&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Doctor Conversion Status&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Used DAX to calculate if a doctor met or exceeded target prescriptions for &lt;strong&gt;3+ consecutive RCPA periods&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Displayed with icons and color-coded status indicators&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Brand Competition&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Stacked column charts comparing our brand’s performance against competitors&lt;/li&gt;
&lt;li&gt;Segmented by region and product category&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Find link to project repository 👉 &lt;a href="https://github.com/loryneJoy/POWERBI_Lux_DS_Project.git" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🔢 Key DAX Measures
&lt;/h2&gt;

&lt;p&gt;Some example DAX measures used:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
dax
Total Rx = SUM('RCPA Data'[Prescription Quantity])

Target Met = 
IF(
    'RCPA Data'[Total Rx] &amp;gt;= 'Brand Targets'[Target Qty], 
    "Yes", 
    "No"
)

Doctor Conversion = 
// Custom logic to track 3 consecutive periods (simplified here)

---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
    </item>
    <item>
      <title>HOW EXCEL IS USED IN REAL-WORLD DATA ANALYSIS</title>
      <dc:creator>Loryne Joy Omwando</dc:creator>
      <pubDate>Mon, 09 Jun 2025 18:51:48 +0000</pubDate>
      <link>https://dev.to/loryne_joy/how-excel-is-used-in-real-world-data-analysis-47na</link>
      <guid>https://dev.to/loryne_joy/how-excel-is-used-in-real-world-data-analysis-47na</guid>
      <description>&lt;h2&gt;
  
  
  INTRODUCTION TO EXCEL: A Data Analyst’s Multi-Purpose Tool
&lt;/h2&gt;

&lt;p&gt;My perception of Excel changed when I enrolled in a Data Analytics course at LuxDevHQ. Earlier on, I perceived Excel as tool to make basic calculation, create lists, budgets and schedules. But by interacting with it, I have come to learn that Excel is so much more than rows and columns; it's about generating actionable insights and making informed decisions. &lt;/p&gt;

&lt;p&gt;Excel is a data analysis tool essential and effective in the analysis and visualization of data. Despite there being other data analysis tools like Python, Tableau, SQL, excel remains the easiest to access and learn as it is always easily and readily accessible on the computer.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;What is Data Analysis?&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
This is the process of examining, cleaning, transforming and modelling data (raw facts/information) to identify patterns/ trends that will play part in decision making.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;What is Data?&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
Data refers to raw or unprocessed facts, that hasn’t been cleaned or analyzed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk666h39arvgjo9bbp33o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk666h39arvgjo9bbp33o.jpg" alt=" " width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;EXCEL IN THE REAL WORLD&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It is quite mind-blowing to know that Excel is quite popular and useful across the world, even in different sectors. &lt;br&gt;
&lt;strong&gt;&lt;em&gt;Financial Reporting and Budgeting:&lt;/em&gt;&lt;/strong&gt; Isn’t it fascinating how excel is used by companies to analyze revenue in the year’s quarters to create detailed financial reports, track spending and forecast future spending. &lt;br&gt;
&lt;strong&gt;&lt;em&gt;Business Actions:&lt;/em&gt;&lt;/strong&gt; Excel is used by businesses to monitor important metrics like financial results, customer trends, and sales performance. Users can find trends, compare performance over time, and pinpoint areas for improvement with the help of features like PivotTables, charts, formulas, and conditional formatting.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Marketing Performance Tracking:&lt;/em&gt;&lt;/strong&gt; By utilizing Excel, businesses are able to identify the current trends in sales and monitor the changes that occur, and even calculate returns of investments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw8kw92kn2iln0a2fxpg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw8kw92kn2iln0a2fxpg.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;USEFUL EXCEL FEATURES/ FORMULAS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;1. VLOOKUP ():&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
Previously, I never knew of its existence. But now I know how useful it is in finding information in large sets of data.It is a function, known as a vertical lookup, that searches for a value in a column in a table, and returns a value in the same row, from a specific column. To easily understand it's syntax, it looks something like this: =VLOOKUP(lookup value, range containing the lookup value, the column number in the range containing the return value, Approximate match (TRUE) or Exact match (FALSE)).&lt;br&gt;
For example, if column A has Item names and column B has item prices, to find price of a calculator in cell A2, apply the formula below to return the corresponding value.&lt;br&gt;
=VLOOKUP("Calculator",A2:B10,2,FALSE)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;2. Conditional Formatting:&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
This feature is useful in highlighting cells to follow rules set. For example, a rule can be made to only allow data in a particular cell based on roles or criteria. Also, you can highlight test scores less than 50%, by selecting the 'Highlight cell rules' &amp;gt; Less than&amp;gt; then choose a fill colour.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;3. Index- Match:&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
This is a flexible function necessary in pulling information for example client phone number based on unique identifiers, especially when working with multiple sheets of data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;4. Pivot Tables:&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This is a great and interesting feature in summarizing data. One can easily group a huge data list by item, region, cost, rating, etc, and thereafter, even manage to generate charts with the same information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2adwpeu8yfvk5u5xjc7a.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2adwpeu8yfvk5u5xjc7a.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;PERSONAL REFLECTION&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Realizing that data isn't only for tech experts or statisticians is the biggest shift. Prior to learning and exploring Excel, I saw data as quite dry and disinteresting. However, I now see data, its processes of cleaning and analysis as an interestingly unfolding story, knowing that by learning Excel, I am constantly developing abilities that will enable me to convert raw data into useful business value. This gives me a sense of empowerment and encourages me to want to learn and explore more on Excel and data analysis in general.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01ss8v6ei8mnxgrmmu95.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01ss8v6ei8mnxgrmmu95.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>learning</category>
      <category>datascience</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
