<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chanchal Singh</title>
    <description>The latest articles on DEV Community by Chanchal Singh (@brains_behind_bots).</description>
    <link>https://dev.to/brains_behind_bots</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3132277%2F81626fba-8d4f-4b07-bae4-cc27f3ff31ac.jpg</url>
      <title>DEV Community: Chanchal Singh</title>
      <link>https://dev.to/brains_behind_bots</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brains_behind_bots"/>
    <language>en</language>
    <item>
      <title>Day 5 : Is Your Model Actually Good? - Evaluation Metrics</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Thu, 22 Jan 2026 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/day-5-is-your-model-actually-good-evaluation-metrics-1bm7</link>
      <guid>https://dev.to/brains_behind_bots/day-5-is-your-model-actually-good-evaluation-metrics-1bm7</guid>
      <description>&lt;p&gt;You prepare for an exam.&lt;/p&gt;

&lt;p&gt;You give a mock test.&lt;br&gt;
You get &lt;strong&gt;72 marks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now the real question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did I pass or fail?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The real question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“How good is 72?”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Is it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better than before?&lt;/li&gt;
&lt;li&gt;Good enough?&lt;/li&gt;
&lt;li&gt;Just lucky?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s exactly what &lt;strong&gt;model evaluation&lt;/strong&gt; is about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why We Need Evaluation
&lt;/h2&gt;

&lt;p&gt;A model can always give predictions.&lt;/p&gt;

&lt;p&gt;But prediction alone means nothing.&lt;/p&gt;

&lt;p&gt;We must ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can I trust this model?&lt;/li&gt;
&lt;li&gt;Will it work on new data?&lt;/li&gt;
&lt;li&gt;Is it learning patterns or memorizing data?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Evaluation answers these questions.&lt;/p&gt;




&lt;h2&gt;
  
  
  R-squared (R²): The Most Popular Metric
&lt;/h2&gt;

&lt;p&gt;Imagine this.&lt;/p&gt;

&lt;p&gt;You’re trying to predict &lt;strong&gt;house prices&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Before using ML, your best guess is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“All houses cost around ₹50 lakh.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s your &lt;strong&gt;baseline&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now your model predicts different prices for different houses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumc6e5qlbrkyhzzi9kk0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumc6e5qlbrkyhzzi9kk0.png" alt="R square visualization" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;R² asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How much better is your model compared to this dumb guess?”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  R² in simple words
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;R² tells how much of the problem your model explains.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwj50qj54d0k9k1p855w7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwj50qj54d0k9k1p855w7.png" alt="R² demonstration" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;R² = 0.80 → model explains 80% pattern&lt;/li&gt;
&lt;li&gt;R² = 0.20 → model explains very little&lt;/li&gt;
&lt;li&gt;R² = 1 → perfect (rare, suspicious)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Important Truth About R²
&lt;/h2&gt;

&lt;p&gt;High R² does &lt;strong&gt;not always mean good model&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It can overfit&lt;/li&gt;
&lt;li&gt;It can memorize&lt;/li&gt;
&lt;li&gt;It can fail on new data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why we never trust R² alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Residuals: Listening to the Model’s Mistakes
&lt;/h2&gt;

&lt;p&gt;Residual = actual value − predicted value.&lt;/p&gt;

&lt;p&gt;Think of residuals as &lt;strong&gt;model’s complaints&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If residuals look:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random → model is healthy&lt;/li&gt;
&lt;li&gt;Patterned → model is missing something&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Residual plots help us see:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Is the model behaving logically?”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Standard Error (SE): How Confident Is the Model?
&lt;/h2&gt;

&lt;p&gt;Imagine two friends predicting house prices.&lt;/p&gt;

&lt;p&gt;Friend A:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Usually wrong by ₹5,000&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Friend B:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Usually wrong by ₹50,000&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Who do you trust more?&lt;/p&gt;

&lt;p&gt;Standard Error tells:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“On average, how far predictions are from truth.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lower SE = more reliable model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Train vs Test Performance (Very Important)
&lt;/h2&gt;

&lt;p&gt;If:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training accuracy is very high&lt;/li&gt;
&lt;li&gt;Testing accuracy is low&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Model memorized instead of learning.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is how we detect &lt;strong&gt;overfitting&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Like a student who learns answers by heart but fails when questions change.&lt;/p&gt;

&lt;p&gt;This problem is called overfitting — the model knows the past too well,&lt;br&gt;
but can’t handle anything new.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tiny Real-Life Thought 🧠
&lt;/h2&gt;

&lt;p&gt;If someone always scores high in practice tests&lt;br&gt;
but fails in the real exam —&lt;/p&gt;

&lt;p&gt;you know something is wrong.&lt;/p&gt;

&lt;p&gt;Same with ML models.&lt;/p&gt;




&lt;h2&gt;
  
  
  3-Line Takeaway
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Evaluation tells if model is trustworthy&lt;/li&gt;
&lt;li&gt;R² shows explained variation&lt;/li&gt;
&lt;li&gt;SE shows prediction reliability&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What’s Coming Next 👀
&lt;/h3&gt;

&lt;p&gt;Now the big question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why do some models fail even when metrics look good?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That leads us to:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Day 6 — Why Linear Regression Breaks (Assumptions &amp;amp; Multicollinearity)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>ai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Day 4 : How Machines Learn From Their Mistakes</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Tue, 20 Jan 2026 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/day-4-how-machines-learn-from-their-mistakes-1ih</link>
      <guid>https://dev.to/brains_behind_bots/day-4-how-machines-learn-from-their-mistakes-1ih</guid>
      <description>&lt;p&gt;Imagine you are standing on a &lt;strong&gt;hill at night&lt;/strong&gt; 🌙.&lt;br&gt;
It’s dark. Fog everywhere.&lt;/p&gt;

&lt;p&gt;Your goal?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reach the &lt;strong&gt;lowest point&lt;/strong&gt; of the hill.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But there’s a problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can’t see the whole hill&lt;/li&gt;
&lt;li&gt;You can only see &lt;strong&gt;one step ahead&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what do you do?&lt;/p&gt;

&lt;p&gt;You take a small step &lt;strong&gt;downwards&lt;/strong&gt;.&lt;br&gt;
Then another.&lt;br&gt;
Then another.&lt;/p&gt;

&lt;p&gt;Slowly… you reach the bottom.&lt;/p&gt;

&lt;p&gt;That is &lt;strong&gt;Gradient Descent&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg61mqkaidhusu5zlxeec.png" alt="Gradient Descent 3D Graph"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What Problem Is Gradient Descent Solving?
&lt;/h2&gt;

&lt;p&gt;From Day 3, we learned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every model makes mistakes&lt;/li&gt;
&lt;li&gt;Those mistakes are measured using &lt;strong&gt;loss&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now the big question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How does the model reduce this loss?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;By slowly adjusting itself in the right direction.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That adjustment process is called &lt;strong&gt;Gradient Descent&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Think Like the Model 🧠
&lt;/h2&gt;

&lt;p&gt;The model keeps asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Am I too high?”&lt;/li&gt;
&lt;li&gt;“Am I too low?”&lt;/li&gt;
&lt;li&gt;“Which direction reduces my mistake?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it &lt;strong&gt;moves step by step&lt;/strong&gt; to reduce loss.&lt;/p&gt;

&lt;p&gt;Not randomly.&lt;br&gt;
Not all at once.&lt;br&gt;
Slowly and carefully.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Moving?
&lt;/h2&gt;

&lt;p&gt;Remember the straight line from Day 2?&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/brains_behind_bots/day-2-linear-regression-how-a-straight-line-learns-from-data-222l" class="crayons-story__hidden-navigation-link"&gt;Day 2 — Linear Regression: How a Straight Line Learns From Data&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/brains_behind_bots" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3132277%2F81626fba-8d4f-4b07-bae4-cc27f3ff31ac.jpg" alt="brains_behind_bots profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/brains_behind_bots" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Chanchal Singh
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Chanchal Singh
                
              
              &lt;div id="story-author-preview-content-3174821" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/brains_behind_bots" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3132277%2F81626fba-8d4f-4b07-bae4-cc27f3ff31ac.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Chanchal Singh&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/brains_behind_bots/day-2-linear-regression-how-a-straight-line-learns-from-data-222l" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jan 17&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/brains_behind_bots/day-2-linear-regression-how-a-straight-line-learns-from-data-222l" id="article-link-3174821"&gt;
          Day 2 — Linear Regression: How a Straight Line Learns From Data
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datascience"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datascience&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/machinelearning"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;machinelearning&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/beginners"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;beginners&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/brains_behind_bots/day-2-linear-regression-how-a-straight-line-learns-from-data-222l" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;5&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/brains_behind_bots/day-2-linear-regression-how-a-straight-line-learns-from-data-222l#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            3 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


&lt;br&gt;
That line depends on:

&lt;ul&gt;
&lt;li&gt;Coefficients&lt;/li&gt;
&lt;li&gt;Intercept&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gradient Descent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tweaks these values&lt;/li&gt;
&lt;li&gt;Checks loss again&lt;/li&gt;
&lt;li&gt;Tweaks again&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Until loss becomes &lt;strong&gt;as small as possible&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Learning Rate: Size of the Step 👣
&lt;/h2&gt;

&lt;p&gt;Now comes an important choice.&lt;/p&gt;

&lt;p&gt;How &lt;strong&gt;big&lt;/strong&gt; should each step be?&lt;/p&gt;

&lt;p&gt;That choice is called &lt;strong&gt;learning rate&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  If the learning rate is too big 🚀
&lt;/h3&gt;

&lt;p&gt;You jump too far.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Miss the bottom&lt;/li&gt;
&lt;li&gt;Bounce around&lt;/li&gt;
&lt;li&gt;Never settle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Like jumping down stairs instead of walking.&lt;/p&gt;




&lt;h3&gt;
  
  
  If the learning rate is too small 🐢
&lt;/h3&gt;

&lt;p&gt;You move very slowly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’ll reach the bottom&lt;/li&gt;
&lt;li&gt;But it’ll take forever&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Like taking baby steps on a long road.&lt;/p&gt;




&lt;p&gt;📌 &lt;strong&gt;Good learning rate = steady, confident steps&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Feature Scaling Matters Here
&lt;/h2&gt;

&lt;p&gt;Imagine walking downhill:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One step forward = 1 meter&lt;/li&gt;
&lt;li&gt;One step sideways = 1 kilometer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Movement becomes awkward.&lt;/p&gt;

&lt;p&gt;Same with data.&lt;/p&gt;

&lt;p&gt;If one feature is very large and another is very small:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gradient Descent struggles&lt;/li&gt;
&lt;li&gt;Learning becomes slow or unstable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feature scaling makes all features:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Speak the &lt;strong&gt;same language&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  When Gradient Descent Stops
&lt;/h2&gt;

&lt;p&gt;Gradient Descent stops when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loss stops decreasing&lt;/li&gt;
&lt;li&gt;Model is no longer improving&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That point is called:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Minimum loss&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the “bottom of the hill”.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tiny Thought Experiment 🧠
&lt;/h2&gt;

&lt;p&gt;Trying to lose weight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sudden extreme dieting ❌&lt;/li&gt;
&lt;li&gt;Slow, consistent effort ✅&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gradient Descent believes in &lt;strong&gt;consistency&lt;/strong&gt;, not shortcuts.&lt;/p&gt;




&lt;h2&gt;
  
  
  3-Line Takeaway
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Gradient Descent reduces loss step by step&lt;/li&gt;
&lt;li&gt;Learning rate controls step size&lt;/li&gt;
&lt;li&gt;Feature scaling helps learning move smoothly&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What’s Coming Next 👀
&lt;/h3&gt;

&lt;p&gt;Now the question becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do we know if the model we trained is actually good?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s where &lt;strong&gt;evaluation metrics&lt;/strong&gt; come in.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Day 5 — Is Your Regression Model Any Good? (Evaluation Metrics)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>ai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Day 3 — Errors &amp; Loss Functions: Measuring How Wrong a Model Is</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Mon, 19 Jan 2026 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/day-3-errors-loss-functions-measuring-how-wrong-a-model-is-h29</link>
      <guid>https://dev.to/brains_behind_bots/day-3-errors-loss-functions-measuring-how-wrong-a-model-is-h29</guid>
      <description>&lt;p&gt;You’re trying to guess your &lt;strong&gt;monthly electricity bill&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You think:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Maybe around ₹1,500 this month.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The bill arrives.&lt;/p&gt;

&lt;p&gt;Actual bill: &lt;strong&gt;₹1,620&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You smile and say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Hmm… close, but not exact.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That &lt;strong&gt;gap&lt;/strong&gt; between what you guessed and what actually happened&lt;br&gt;
is called &lt;strong&gt;error&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  So, What Is Error Really?
&lt;/h2&gt;

&lt;p&gt;In simple human language:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Error is how far your guess is from reality.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predicted number → your guess&lt;/li&gt;
&lt;li&gt;Actual number → truth&lt;/li&gt;
&lt;li&gt;Difference → error&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every prediction has an error.&lt;br&gt;
Even humans make them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Errors Are Normal (And Not a Problem)
&lt;/h2&gt;

&lt;p&gt;Real life is not neat.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;People behave differently&lt;/li&gt;
&lt;li&gt;Weather changes&lt;/li&gt;
&lt;li&gt;Markets move randomly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So expecting &lt;strong&gt;perfect predictions&lt;/strong&gt; is unrealistic.&lt;/p&gt;

&lt;p&gt;Machine learning doesn’t try to be perfect.&lt;br&gt;
It tries to be &lt;strong&gt;less wrong every time&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Absolute Error: “Just Tell Me How Wrong I Am”
&lt;/h2&gt;

&lt;p&gt;Imagine your friend asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I don’t care if you guessed more or less.&lt;br&gt;
Just tell me how off you were.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That thinking is called &lt;strong&gt;Absolute Error&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You predicted too high → error&lt;/li&gt;
&lt;li&gt;You predicted too low → error&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only the &lt;strong&gt;size of the mistake&lt;/strong&gt; matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  One Guess Is Not Enough
&lt;/h2&gt;

&lt;p&gt;Now imagine this:&lt;/p&gt;

&lt;p&gt;You guessed the bill &lt;strong&gt;every month for a year&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Some months: Very close&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some months: Way off&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now the question becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Overall, how good are my guesses?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To answer that, we need a &lt;strong&gt;single score&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That score is called a &lt;strong&gt;loss&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvch8j0vdge34tdh21u04.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvch8j0vdge34tdh21u04.png" alt="Loss Function explanation with example" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Loss Function: The Model’s Report Card
&lt;/h2&gt;

&lt;p&gt;Think of a loss function like a &lt;strong&gt;report card&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It looks at &lt;strong&gt;all mistakes together&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Gives &lt;strong&gt;one number&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Lower number = better performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Models don’t feel emotions.&lt;br&gt;
They only understand numbers.&lt;/p&gt;

&lt;p&gt;Loss tells them:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“You’re doing okay”&lt;br&gt;
or&lt;br&gt;
“You’re doing badly — improve.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Mean Squared Error: Why Big Mistakes Hurt More
&lt;/h2&gt;

&lt;p&gt;Now here’s the clever part.&lt;/p&gt;

&lt;p&gt;Imagine two mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One mistake of ₹50&lt;/li&gt;
&lt;li&gt;One mistake of ₹500&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which one should worry you more?&lt;/p&gt;

&lt;p&gt;Obviously, &lt;strong&gt;₹500&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Mean Squared Error (MSE) thinks the same way.&lt;/p&gt;

&lt;p&gt;It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Makes small mistakes small&lt;/li&gt;
&lt;li&gt;Makes big mistakes &lt;strong&gt;very big&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This forces the model to say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I must avoid big blunders.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s why MSE is widely used in linear regression.&lt;br&gt;
Not because it’s fancy.&lt;br&gt;
Because it matches human common sense.&lt;/p&gt;

&lt;p&gt;One-Line Memory Hook&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"&lt;strong&gt;MSE shouts at big mistakes and whispers at small ones.&lt;/strong&gt;"&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How This Chooses the Best Line
&lt;/h2&gt;

&lt;p&gt;Remember the straight line for Linear Regression from Day 2?&lt;/p&gt;

&lt;p&gt;Linear regression:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tries many possible lines&lt;/li&gt;
&lt;li&gt;Calculates loss for each line&lt;/li&gt;
&lt;li&gt;Picks the line with &lt;strong&gt;lowest loss&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s how the “best line” is chosen.&lt;/p&gt;

&lt;p&gt;Not by looks.&lt;br&gt;
By &lt;strong&gt;least mistake&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tiny Thought Experiment 🧠
&lt;/h2&gt;

&lt;p&gt;If your predictions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always off by ₹20 → acceptable&lt;/li&gt;
&lt;li&gt;Sometimes off by ₹500 → dangerous&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Loss functions feel the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Takeaways (Remember These)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Error = mistake for one prediction&lt;/li&gt;
&lt;li&gt;Loss = overall mistake score&lt;/li&gt;
&lt;li&gt;MSE punishes big mistakes more&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What’s Coming Next 👀
&lt;/h3&gt;

&lt;p&gt;Now the big question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How does the model actually reduce this loss?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s where training begins.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Day 4 — Teaching the Model to Improve (Gradient Descent)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>ai</category>
      <category>beginners</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Day 2 — Linear Regression: How a Straight Line Learns From Data</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Sat, 17 Jan 2026 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/day-2-linear-regression-how-a-straight-line-learns-from-data-222l</link>
      <guid>https://dev.to/brains_behind_bots/day-2-linear-regression-how-a-straight-line-learns-from-data-222l</guid>
      <description>&lt;p&gt;Riya is in school.&lt;br&gt;
Exams are coming.&lt;/p&gt;

&lt;p&gt;Her elder sister notices something interesting.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Study Hours&lt;/th&gt;
&lt;th&gt;Marks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 hours&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3 hours&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The sister laughs and says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Arre, the more you study, the more marks you get — very predictable!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Without knowing it, &lt;strong&gt;Riya’s sister just did Linear Regression&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  So… What Is Linear Regression Really?
&lt;/h2&gt;

&lt;p&gt;Forget the big name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linear Regression simply means:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Finding a straight-line relationship between input and output.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In normal human language:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input increases&lt;/li&gt;
&lt;li&gt;Output increases (or decreases)&lt;/li&gt;
&lt;li&gt;In a &lt;strong&gt;steady, predictable way&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That steady behavior is the key.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why a “Straight Line”?
&lt;/h2&gt;

&lt;p&gt;Because life is sometimes simple.&lt;/p&gt;

&lt;p&gt;Think about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More work experience → more salary&lt;/li&gt;
&lt;li&gt;Bigger house → higher price&lt;/li&gt;
&lt;li&gt;More units used → higher electricity bill&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your brain already expects a &lt;strong&gt;straight pattern&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Linear regression just &lt;strong&gt;draws that pattern using data&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is the Model Actually Doing?
&lt;/h2&gt;

&lt;p&gt;Imagine a board with many dots on it 📍&lt;br&gt;
Each dot is one real example.&lt;/p&gt;

&lt;p&gt;Linear regression’s job is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Let me draw ONE straight line that passes as close as possible to all these dots.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6uehz16b52356bs42hr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6uehz16b52356bs42hr.png" alt="Linear Regression graph" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not touching every dot.&lt;br&gt;
Not perfect.&lt;br&gt;
Just &lt;strong&gt;the best overall line&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s it. That’s the model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Simple vs Multiple Linear Regression
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Simple Linear Regression
&lt;/h3&gt;

&lt;p&gt;One input → one output&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hours studied → Marks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Multiple Linear Regression
&lt;/h3&gt;

&lt;p&gt;Many inputs → one output&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;House size&lt;/li&gt;
&lt;li&gt;Number of rooms&lt;/li&gt;
&lt;li&gt;Location&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;→ House price&lt;/p&gt;

&lt;p&gt;Same idea.&lt;br&gt;
Just &lt;strong&gt;more information&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftd0x5azit2anx5ce2li6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftd0x5azit2anx5ce2li6.png" alt="Simple vs Multiple Linear regression graph" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Coefficients — The Real Power
&lt;/h2&gt;

&lt;p&gt;Imagine an HR manager deciding your salary.&lt;/p&gt;

&lt;p&gt;She looks at two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your &lt;strong&gt;experience&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Your &lt;strong&gt;skills&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But she doesn’t treat them equally.&lt;/p&gt;

&lt;p&gt;Imagine this formula (don’t fear it):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Salary =&lt;br&gt;
(Experience × 5000) + (Skills × 3000) + Base Pay&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Those numbers &lt;strong&gt;5000&lt;/strong&gt; and &lt;strong&gt;3000&lt;/strong&gt; are called &lt;strong&gt;coefficients&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;She thinks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Experience adds &lt;strong&gt;a lot&lt;/strong&gt; of value.”&lt;/li&gt;
&lt;li&gt;“Skills add value too, but &lt;strong&gt;a little less&lt;/strong&gt;.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those hidden importance levels are called &lt;strong&gt;coefficients&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3rvz229t89qbz2b0b80.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3rvz229t89qbz2b0b80.png" alt="HR deciding your salary based on various factors" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If something changes the salary &lt;strong&gt;more&lt;/strong&gt;, it gets a &lt;strong&gt;bigger number&lt;/strong&gt;.&lt;br&gt;
If it changes the salary &lt;strong&gt;less&lt;/strong&gt;, it gets a &lt;strong&gt;smaller number&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just like cooking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Salt affects taste a lot&lt;/li&gt;
&lt;li&gt;Chili affects taste, but less&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why companies love linear regression.&lt;br&gt;
It doesn’t just predict a number — it &lt;strong&gt;explains why&lt;/strong&gt; that number makes sense.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Bigger coefficient = bigger influence.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Simple.&lt;/p&gt;




&lt;h2&gt;
  
  
  Intercept — The Starting Point
&lt;/h2&gt;

&lt;p&gt;What if someone has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0 experience&lt;/li&gt;
&lt;li&gt;0 skills&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Will salary be zero?&lt;/p&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;There’s usually a &lt;strong&gt;base salary&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That base value is called the &lt;strong&gt;intercept&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In simple words:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Intercept is where the line starts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Linear Regression Is Everywhere
&lt;/h2&gt;

&lt;p&gt;Because it is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy to understand&lt;/li&gt;
&lt;li&gt;Fast to train&lt;/li&gt;
&lt;li&gt;Easy to explain to managers&lt;/li&gt;
&lt;li&gt;Very popular in interviews&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interview truth:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;They don’t care if you remember the formula.&lt;br&gt;
They care if you &lt;strong&gt;understand the behavior&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  When This Straight Line Becomes a Bad Idea
&lt;/h2&gt;

&lt;p&gt;Now imagine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Salary jumps suddenly&lt;/li&gt;
&lt;li&gt;Prices go up and down randomly&lt;/li&gt;
&lt;li&gt;Data looks like curves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trying to force a straight line there is like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Using a ruler to measure a circle.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It won’t work well.&lt;/p&gt;

&lt;p&gt;We’ll break this properly later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tiny Brain Exercise 🧠
&lt;/h2&gt;

&lt;p&gt;Think about your monthly mobile bill.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More data used → higher bill&lt;/li&gt;
&lt;li&gt;Less data → lower bill&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You already expect a straight relationship.&lt;/p&gt;

&lt;p&gt;That expectation is &lt;strong&gt;linear regression intuition&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  3 Things You Must Remember
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Linear regression fits a &lt;strong&gt;straight line&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Coefficients show &lt;strong&gt;importance&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Intercept is the &lt;strong&gt;starting value&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What’s Coming Next 👀
&lt;/h3&gt;

&lt;p&gt;Now that we have a line…&lt;/p&gt;

&lt;p&gt;Big question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do we know if this line is good or terrible?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s where &lt;strong&gt;errors and loss functions&lt;/strong&gt; enter.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Day 3 — Errors &amp;amp; Loss Functions: Measuring How Wrong a Model Is&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>ai</category>
    </item>
    <item>
      <title>Day 1: Regression — The Art of Prediction</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Fri, 16 Jan 2026 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/day-1-regression-the-art-of-prediction-26aa</link>
      <guid>https://dev.to/brains_behind_bots/day-1-regression-the-art-of-prediction-26aa</guid>
      <description>&lt;h3&gt;
  
  
  Imagine this 👇
&lt;/h3&gt;

&lt;p&gt;You run a small &lt;strong&gt;chai stall&lt;/strong&gt; ☕.&lt;br&gt;
Every day people come and ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Bhaiya, aaj kitni chai bikegi?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You think for a second and say:&lt;br&gt;
“Yesterday it was cold, more people came… today it’s sunny, maybe less.”&lt;/p&gt;

&lt;p&gt;Without knowing it, &lt;strong&gt;you are already doing regression&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  1️⃣ What is Regression?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Regression means:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Using past information to predict a number in the future.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s it. No fancy definition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Examples:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Predict house price&lt;/td&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predict salary&lt;/td&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predict temperature&lt;/td&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predict pass/fail&lt;/td&gt;
&lt;td&gt;❌ Not regression&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;👉 &lt;strong&gt;If the output is a NUMBER → it’s regression&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2️⃣ Why Do We Need Regression?
&lt;/h2&gt;

&lt;p&gt;Because humans:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guess roughly&lt;/li&gt;
&lt;li&gt;Forget patterns&lt;/li&gt;
&lt;li&gt;Get biased&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Machines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remember all data&lt;/li&gt;
&lt;li&gt;See patterns clearly&lt;/li&gt;
&lt;li&gt;Give consistent predictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we let the &lt;strong&gt;machine learn from past data&lt;/strong&gt; and predict for us.&lt;/p&gt;




&lt;h2&gt;
  
  
  3️⃣ Input &amp;amp; Output
&lt;/h2&gt;

&lt;p&gt;Think of regression like a &lt;strong&gt;juice machine&lt;/strong&gt; &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Part&lt;/th&gt;
&lt;th&gt;ML Term&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fruits you put in&lt;/td&gt;
&lt;td&gt;Input / Features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Juice you get&lt;/td&gt;
&lt;td&gt;Output / Target&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Example:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inputs:&lt;/strong&gt; House size, number of rooms, location&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; House price&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regression learns:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“If inputs look like this → output is usually that”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  4️⃣ Regression vs Classification
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Regression&lt;/th&gt;
&lt;th&gt;Classification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Predicts numbers&lt;/td&gt;
&lt;td&gt;Predicts labels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Salary = ₹50,000&lt;/td&gt;
&lt;td&gt;Spam / Not Spam&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;House price&lt;/td&gt;
&lt;td&gt;Yes / No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temperature&lt;/td&gt;
&lt;td&gt;Pass / Fail&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;📌 &lt;strong&gt;Interview rule:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If output is continuous → Regression&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  5️⃣ Real-Life Use Cases
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Regression Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Finance&lt;/td&gt;
&lt;td&gt;Loan amount prediction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Healthcare&lt;/td&gt;
&lt;td&gt;Recovery time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real estate&lt;/td&gt;
&lt;td&gt;House prices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E-commerce&lt;/td&gt;
&lt;td&gt;Demand forecasting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weather&lt;/td&gt;
&lt;td&gt;Rainfall amount&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Regression is &lt;strong&gt;everywhere&lt;/strong&gt;, quietly working.&lt;/p&gt;




&lt;h2&gt;
  
  
  6️⃣ Supervised Learning
&lt;/h2&gt;

&lt;p&gt;Imagine a child is learning maths.&lt;/p&gt;

&lt;p&gt;The teacher:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shows a question&lt;/li&gt;
&lt;li&gt;Shows the &lt;strong&gt;correct answer&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Corrects mistakes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Slowly, the child learns:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“When I see this kind of question, the answer is usually this.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s &lt;strong&gt;supervised learning&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Now Apply This to Regression
&lt;/h3&gt;

&lt;p&gt;In regression, the &lt;strong&gt;machine is the child&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We give the machine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inputs&lt;/strong&gt; → house size, rooms, location&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correct output&lt;/strong&gt; → actual house price&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the machine learns:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“When these inputs appear together, this is the price.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is called Supervised Learning because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model is &lt;strong&gt;not guessing blindly&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;We already &lt;strong&gt;know the right answers&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;We “supervise” the learning by correcting it&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Simple Rule to Remember
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If the data already has correct answers → it’s supervised learning&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Tiny Real-Life Analogy
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Learning Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Teacher checks homework&lt;/td&gt;
&lt;td&gt;Supervised&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Child learns alone by trial&lt;/td&gt;
&lt;td&gt;Unsupervised&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Regression = &lt;strong&gt;teacher checking homework&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Regression is a supervised learning algorithm because the model learns from labeled data where the correct output is already known.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Supervised learning = learning with answers&lt;/li&gt;
&lt;li&gt;Regression always learns this way&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  7️⃣ Tiny Intuition Practice
&lt;/h2&gt;

&lt;p&gt;Think about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your phone price&lt;/li&gt;
&lt;li&gt;Inputs: RAM, storage, brand&lt;/li&gt;
&lt;li&gt;Output: Price&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your brain already does regression.&lt;br&gt;
ML just does it &lt;strong&gt;faster and better&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  8️⃣ 3-Line Takeaway (Remember This)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Regression predicts &lt;strong&gt;numbers&lt;/strong&gt;, not labels&lt;/li&gt;
&lt;li&gt;It learns patterns from &lt;strong&gt;past data&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;You already use regression in daily life&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What’s Coming Next
&lt;/h3&gt;

&lt;p&gt;Now that we know &lt;strong&gt;what regression is&lt;/strong&gt;, next question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“How does a machine actually learn the best prediction?”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s where &lt;strong&gt;Linear Regression&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Day 2: How a Straight Line Learns From Data&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Statistics Day 9: Bootstrapping Made Simple: The Easiest Way to Understand Resampling</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Tue, 25 Nov 2025 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/statistics-day-9-bootstrapping-made-simple-the-easiest-way-to-understand-resampling-4ob6</link>
      <guid>https://dev.to/brains_behind_bots/statistics-day-9-bootstrapping-made-simple-the-easiest-way-to-understand-resampling-4ob6</guid>
      <description>&lt;p&gt;What do you do when your dataset is small, you can’t collect more data, and every conclusion feels unreliable?&lt;/p&gt;

&lt;p&gt;Most beginners think the only answer is: “Get more data.”&lt;br&gt;
But statisticians discovered a smarter trick decades ago.&lt;/p&gt;

&lt;p&gt;They learned how to squeeze hundreds of new datasets out of one tiny dataset—&lt;br&gt;
without changing a single value in it.&lt;/p&gt;

&lt;p&gt;This trick is called Bootstrapping,&lt;br&gt;
and once you understand it, your confidence intervals, model stability, and estimates will instantly make more sense.&lt;/p&gt;

&lt;p&gt;Let’s break it down in the simplest way possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What is Resampling?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Resampling means:&lt;br&gt;
&lt;strong&gt;Taking samples from your existing data again and again to learn more about the population.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is used when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is small&lt;/li&gt;
&lt;li&gt;You can’t collect more data&lt;/li&gt;
&lt;li&gt;You want to estimate accuracy or uncertainty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two main types:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bootstrapping&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A resampling method where you create many new datasets by sampling &lt;strong&gt;with replacement&lt;/strong&gt; to estimate a statistic’s accuracy and uncertainty.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Jackknife&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A resampling method where you repeatedly drop one data point at a time to estimate a statistic’s stability, bias, or variance.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What is Bootstrapping?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Imagine you have &lt;strong&gt;one small dataset&lt;/strong&gt;.&lt;br&gt;
Bootstrapping lets you create &lt;strong&gt;hundreds or thousands of new datasets&lt;/strong&gt; from it.&lt;/p&gt;

&lt;p&gt;How?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You randomly pick values from your original data WITH replacement&lt;/strong&gt;&lt;br&gt;
(meaning an item can repeat).&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Original data = [5, 8, 9, 6]&lt;/p&gt;

&lt;p&gt;A bootstrap sample could be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[5, 9, 9, 6]
or&lt;/li&gt;
&lt;li&gt;[8, 5, 8, 9]&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each new sample has the &lt;strong&gt;same length&lt;/strong&gt; as the original.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdyr8zy30ku9eljiwvxu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdyr8zy30ku9eljiwvxu.png" alt="Bootstrap demonstration" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why do this?&lt;/p&gt;

&lt;p&gt;Because it lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Estimate the &lt;strong&gt;true mean&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Estimate &lt;strong&gt;confidence intervals&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Measure &lt;strong&gt;uncertainty&lt;/strong&gt;
even when you don’t have a large dataset.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why Do We Use Bootstrapping?&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;th&gt;Why Bootstrapping Helps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Estimate confidence intervals&lt;/td&gt;
&lt;td&gt;Works even with small sample sizes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test hypotheses&lt;/td&gt;
&lt;td&gt;No need for normal distribution assumption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assess model stability&lt;/td&gt;
&lt;td&gt;Train models on bootstrap samples&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Estimate error&lt;/td&gt;
&lt;td&gt;Helps measure variance and bias&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Bootstrapping is used widely in ML:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random Forest (bootstrap aggregation)&lt;/li&gt;
&lt;li&gt;Bagging models&lt;/li&gt;
&lt;li&gt;Model variance estimation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Super Simple Example&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Imagine you have &lt;strong&gt;only 10 students’ marks&lt;/strong&gt;.&lt;br&gt;
You want to estimate the true class average.&lt;/p&gt;

&lt;p&gt;But 10 students is too small.&lt;/p&gt;

&lt;p&gt;So you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Randomly pick 10 marks &lt;strong&gt;with replacement&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Calculate the average&lt;/li&gt;
&lt;li&gt;Repeat 1,000 times&lt;/li&gt;
&lt;li&gt;Look at all 1,000 averages&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These 1,000 averages show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How stable the average is&lt;/li&gt;
&lt;li&gt;What range it falls in&lt;/li&gt;
&lt;li&gt;How uncertain your estimate is&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps you say something like:&lt;/p&gt;

&lt;p&gt;"There is a 95% chance the true average lies between 72 and 79."&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why Bootstrapping Is So Powerful&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Works even for &lt;strong&gt;tiny datasets&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No assumptions about data shape&lt;/li&gt;
&lt;li&gt;Very easy to compute&lt;/li&gt;
&lt;li&gt;Used in many ML ensemble models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bootstrapping basically says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“If I could collect more data, this is what it might look like.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>statistics</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Statistics Day 8: Understanding A/B Testing and Market Basket Analysis Without the Jargon</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Sat, 22 Nov 2025 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/statistics-day-8-understanding-ab-testing-and-market-basket-analysis-without-the-jargon-19m</link>
      <guid>https://dev.to/brains_behind_bots/statistics-day-8-understanding-ab-testing-and-market-basket-analysis-without-the-jargon-19m</guid>
      <description>&lt;p&gt;Statistics Challenge for Data Scientists&lt;/p&gt;

&lt;p&gt;Today, we’ll understand two very practical ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A/B Testing – how to compare two options and choose the better one using data.&lt;/li&gt;
&lt;li&gt;Market Basket Analysis – how to find which items are often bought together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple concept, but still useful for data scientist.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. What is A/B Testing?
&lt;/h2&gt;

&lt;p&gt;A/B testing is like a fair competition between two versions of something to see which one works better.&lt;/p&gt;

&lt;p&gt;You create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Version A&lt;/li&gt;
&lt;li&gt;Version B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you show A to some people, B to some other people, and compare results.&lt;/p&gt;

&lt;p&gt;We do this to answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which button gets more clicks?&lt;/li&gt;
&lt;li&gt;Which headline makes more people sign up?&lt;/li&gt;
&lt;li&gt;Which page keeps users longer?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Simple example
&lt;/h3&gt;

&lt;p&gt;Imagine you have a website with a “Sign Up” button.&lt;/p&gt;

&lt;p&gt;You are not sure which button color works better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Version A: Red button&lt;/li&gt;
&lt;li&gt;Version B: Green button&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You do not just guess. You:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Show the red button to 50% of visitors (Group A).&lt;/li&gt;
&lt;li&gt;Show the green button to the other 50% (Group B).&lt;/li&gt;
&lt;li&gt;Count how many people clicked Sign Up in each group.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbwyb2yc2l7jix0o5z60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbwyb2yc2l7jix0o5z60.png" alt="A-B Testing demonstration" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example numbers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Visitors&lt;/th&gt;
&lt;th&gt;Sign Ups&lt;/th&gt;
&lt;th&gt;Conversion Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Green&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here, Version B (green) seems better because 12% &amp;gt; 8%.&lt;/p&gt;

&lt;p&gt;Then you use a statistical test (like a t-test or z-test) to check:&lt;br&gt;
“Is this difference real, or could it be just random?”&lt;/p&gt;

&lt;p&gt;If the result is statistically significant (p &amp;lt; 0.05), you choose the better version with confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key ideas in A/B testing (in simple words)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Simple meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Conversion&lt;/td&gt;
&lt;td&gt;The action we care about (click, signup, buy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversion rate&lt;/td&gt;
&lt;td&gt;Conversions ÷ total visitors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Significance&lt;/td&gt;
&lt;td&gt;The result is unlikely to be just random&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  2. What is Market Basket Analysis?
&lt;/h2&gt;

&lt;p&gt;Market Basket Analysis (MBA) is used to find which items are often bought together.&lt;/p&gt;

&lt;p&gt;It answers questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“If a customer buys X, what else are they likely to buy?”&lt;/li&gt;
&lt;li&gt;“Which items should we place together in the store?”&lt;/li&gt;
&lt;li&gt;“Which product combos should we recommend online?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is heavily used in retail and e-commerce.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simple example
&lt;/h3&gt;

&lt;p&gt;Imagine a small grocery shop.&lt;br&gt;
You collect data from different bills (transactions).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncy6olfwzi4iqg0zm4va.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncy6olfwzi4iqg0zm4va.png" alt="Market Basket Analysis" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example transaction data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bill No.&lt;/th&gt;
&lt;th&gt;Items Bought&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Bread, Butter, Milk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Bread, Eggs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Milk, Bread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Bread, Butter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Milk, Eggs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Bread, Milk, Butter&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;From this, you might notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bread appears in many bills.&lt;/li&gt;
&lt;li&gt;Bread and Butter appear together often.&lt;/li&gt;
&lt;li&gt;Bread and Milk also appear together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the shop learns:&lt;br&gt;
“If someone buys Bread, there is a good chance they will also buy Butter.”&lt;/p&gt;

&lt;p&gt;This is exactly what Market Basket Analysis is about.&lt;/p&gt;




&lt;h3&gt;
  
  
  Important terms in Market Basket Analysis
&lt;/h3&gt;

&lt;p&gt;Let’s say we are interested in the rule:&lt;/p&gt;

&lt;p&gt;“If a customer buys Bread, then they also buy Butter.”&lt;/p&gt;

&lt;p&gt;We write this as:&lt;br&gt;
Bread → Butter&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Support
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;How often do Bread and Butter appear together in all bills?&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total bills = 6&lt;/li&gt;
&lt;li&gt;Bills with Bread and Butter together: 3 (Bills 1, 4, 6)&lt;/li&gt;
&lt;li&gt;Support = 3/6 = 0.5 (50%)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Confidence
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;When Bread is bought, how often is Butter also bought?&lt;/li&gt;
&lt;li&gt;Bills with Bread: Bills 1, 2, 3, 4, 6 → 5 bills&lt;/li&gt;
&lt;li&gt;Bills with Bread and Butter: 3&lt;/li&gt;
&lt;li&gt;Confidence = 3/5 = 0.6 (60%)&lt;/li&gt;
&lt;li&gt;Interpretation: If someone buys Bread, there is a 60% chance they also buy Butter.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Lift
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;How much more likely is Butter bought when Bread is bought, compared to buying Butter normally?&lt;/li&gt;
&lt;li&gt;If Lift &amp;gt; 1: Bread and Butter are positively associated (good combo).&lt;/li&gt;
&lt;li&gt;If Lift = 1: No special relationship.&lt;/li&gt;
&lt;li&gt;If Lift &amp;lt; 1: They appear together less than expected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You do not need to go deep into the formula right away.&lt;br&gt;
At beginner level, just remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support: How often together?&lt;/li&gt;
&lt;li&gt;Confidence: If A, how likely B?&lt;/li&gt;
&lt;li&gt;Lift: How strong is the relationship?&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Where is Market Basket Analysis used?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Online stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Customers who bought this also bought…”&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Supermarkets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Placing chips near soft drinks&lt;/li&gt;
&lt;li&gt;Placing bread near butter and jam&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Food delivery apps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suggesting sides with a main dish (fries with burger, dessert with pizza)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Question it answers&lt;/th&gt;
&lt;th&gt;Data type mainly used&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A/B Testing&lt;/td&gt;
&lt;td&gt;Which version works better?&lt;/td&gt;
&lt;td&gt;Conversions, click rates etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Market Basket Analysis&lt;/td&gt;
&lt;td&gt;Which items are often bought together?&lt;/td&gt;
&lt;td&gt;Transactions (lists of items)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>statistics</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Statistics Day 7 : Hypothesis Testing Made Super Simple</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Fri, 21 Nov 2025 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/statistics-day-7-hypothesis-testing-made-super-simple-2b72</link>
      <guid>https://dev.to/brains_behind_bots/statistics-day-7-hypothesis-testing-made-super-simple-2b72</guid>
      <description>&lt;p&gt;&lt;em&gt;Statistics Challenge for Data Scientists&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Hypothesis testing sounds scary, but it’s basically a math way of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“Is this thing really happening, or is it just random chance?”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You assume something is true → test it with sample data → decide if evidence is strong enough to reject it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Hypothesis Testing?
&lt;/h2&gt;

&lt;p&gt;Think of it like a court case:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7u2twtijj375l2z07u2y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7u2twtijj375l2z07u2y.png" alt="Hypothesis Testing" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Meaning (Simple)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Null Hypothesis (H0)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Default assumption. “Nothing has changed.”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Alternative Hypothesis (H1)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opposite claim. “Something has changed.”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Probability that the result happened by chance.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Significance Level (α)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cutoff (usually 0.05). If p &amp;lt; 0.05 → reject H0.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test Statistic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A number calculated from data to judge the claim.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why Do We Use Hypothesis Testing?
&lt;/h2&gt;

&lt;p&gt;You cannot test entire populations. So you take a &lt;strong&gt;sample&lt;/strong&gt; and check if the sample result is strong enough to represent the population.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does a new medicine work better than the old one?&lt;/li&gt;
&lt;li&gt;Is the average salary different in two cities?&lt;/li&gt;
&lt;li&gt;Is customer churn related to subscription type?&lt;/li&gt;
&lt;li&gt;Are two features correlated?&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👉 Today’s Focus: T-Test and Chi-Square Test&lt;/p&gt;




&lt;h2&gt;
  
  
  1️⃣ &lt;strong&gt;T-Test&lt;/strong&gt; (Also called &lt;strong&gt;Student’s t-test&lt;/strong&gt;)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does it check?
&lt;/h3&gt;

&lt;p&gt;It checks whether the &lt;strong&gt;mean (average)&lt;/strong&gt; of two groups is different.&lt;/p&gt;

&lt;h3&gt;
  
  
  When do we use it?
&lt;/h3&gt;

&lt;p&gt;Use the &lt;strong&gt;t-test&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The variables are &lt;strong&gt;numerical&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sample size is &lt;strong&gt;small (&amp;lt; 30)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Population variance is &lt;strong&gt;unknown&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example (super simple)
&lt;/h3&gt;

&lt;p&gt;You want to test if &lt;strong&gt;average marks&lt;/strong&gt; of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Students in Class A&lt;/li&gt;
&lt;li&gt;Students in Class B
are different.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a &lt;strong&gt;t-test&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6j6c9asygsg2e5fsz2xa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6j6c9asygsg2e5fsz2xa.png" alt="t-test hypothesis testing" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What does T-test output mean?
&lt;/h3&gt;

&lt;p&gt;If &lt;strong&gt;p &amp;lt; 0.05&lt;/strong&gt; → difference is real.&lt;br&gt;
If &lt;strong&gt;p ≥ 0.05&lt;/strong&gt; → difference is probably due to chance.&lt;/p&gt;




&lt;h2&gt;
  
  
  2️⃣ &lt;strong&gt;Chi-Square (χ²) Test&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does it check?
&lt;/h3&gt;

&lt;p&gt;It checks if &lt;strong&gt;two categorical variables&lt;/strong&gt; are related.&lt;/p&gt;

&lt;p&gt;Examples of categorical variables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gender (Male/Female)&lt;/li&gt;
&lt;li&gt;Payment mode (UPI/Card/Cash)&lt;/li&gt;
&lt;li&gt;Pass/Fail&lt;/li&gt;
&lt;li&gt;Yes/No&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When do you use Chi-square?
&lt;/h3&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both variables are &lt;strong&gt;categories&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;You want to test &lt;strong&gt;independence&lt;/strong&gt;
(“Are these two things connected or completely unrelated?”)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;You want to know if &lt;strong&gt;gender affects shopping preference&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gender&lt;/th&gt;
&lt;th&gt;Likes Online&lt;/th&gt;
&lt;th&gt;Likes Offline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Male&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Female&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwp2h2pc81x940bjgd2w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwp2h2pc81x940bjgd2w.png" alt="chi-square hypothesis testing" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;strong&gt;Chi-square&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interpretation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;If &lt;strong&gt;p &amp;lt; 0.05&lt;/strong&gt; → the two variables are &lt;strong&gt;dependent&lt;/strong&gt; (related).
Example: Gender does affect preference.&lt;/li&gt;
&lt;li&gt;If &lt;strong&gt;p ≥ 0.05&lt;/strong&gt; → variables are &lt;strong&gt;independent&lt;/strong&gt; (not related).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary Table (Easy to remember)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Data Type&lt;/th&gt;
&lt;th&gt;What It Checks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;T-Test&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Compare 2 groups’ means&lt;/td&gt;
&lt;td&gt;Numerical&lt;/td&gt;
&lt;td&gt;Difference in averages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chi-Square&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Check relation between categories&lt;/td&gt;
&lt;td&gt;Categorical&lt;/td&gt;
&lt;td&gt;Dependency / independence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🧡 A Simple Visual View (Mental Model)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  T-Test
&lt;/h3&gt;

&lt;p&gt;Imagine two classrooms took the same exam.&lt;br&gt;
You compare their average marks and ask:&lt;br&gt;
“Is one class truly scoring higher, or is the difference just chance?”&lt;/p&gt;

&lt;h3&gt;
  
  
  Chi-Square
&lt;/h3&gt;

&lt;p&gt;Imagine men and women choosing between online and offline shopping.&lt;br&gt;
You ask:&lt;br&gt;
“Is the choice different because of gender, or is it unrelated?”&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Word
&lt;/h2&gt;

&lt;p&gt;Hypothesis testing is not about proving you are right.&lt;br&gt;
It is about checking whether the data strongly disagrees with the default assumption (H0).&lt;/p&gt;

&lt;p&gt;If the disagreement is strong → H0 gets rejected.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Statistics Day 6: Your First Data Science Superpower: Feature Selection with Correlation &amp; Variance</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Thu, 20 Nov 2025 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/statistics-day-6-your-first-data-science-superpower-feature-selection-with-correlation-variance-5eeb</link>
      <guid>https://dev.to/brains_behind_bots/statistics-day-6-your-first-data-science-superpower-feature-selection-with-correlation-variance-5eeb</guid>
      <description>&lt;p&gt;Feature selection is one of the most important steps before building any machine learning model.&lt;/p&gt;

&lt;p&gt;And one of the simplest tools to do this is &lt;strong&gt;correlation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But correlation alone doesn’t tell the whole story.&lt;br&gt;
To use it correctly, you also need to understand &lt;strong&gt;variance&lt;/strong&gt;, &lt;strong&gt;standard deviation&lt;/strong&gt;, and a few other related statistical terms.&lt;/p&gt;

&lt;p&gt;This blog breaks everything down in the simplest way possible — no heavy maths, just practical understanding.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1. What Is Correlation?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Correlation tells us &lt;strong&gt;how two numerical features move together&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If they grow together → &lt;strong&gt;positive correlation&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If one grows while the other falls → &lt;strong&gt;negative correlation&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If they don’t move in any clear pattern → &lt;strong&gt;zero correlation&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Correlation ranges from &lt;strong&gt;–1 to +1&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;+1&lt;/strong&gt; → perfectly move together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;–1&lt;/strong&gt; → perfectly opposite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; → no relationship&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In feature selection, correlation helps you answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Which features are actually related to the target?”&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;“Which features are repeating the same information?”&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. How Do We Use Correlation for Feature Selection?&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A. Select Features That Are Correlated With the Target&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you're predicting &lt;strong&gt;house price&lt;/strong&gt;, and &lt;code&gt;size_in_sqft&lt;/code&gt; has &lt;strong&gt;high correlation&lt;/strong&gt; with price, that feature is useful.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Correlation with Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Size (sqft)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.82&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No. of rooms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.65&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Age of house&lt;/td&gt;
&lt;td&gt;–0.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zip code&lt;/td&gt;
&lt;td&gt;0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;High correlation → strong predictive power.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cfzok14rrna4agk5kau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cfzok14rrna4agk5kau.png" alt="Correlation Heatmap" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;B. Remove Features That Are Highly Correlated With Each Other&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When two features are &lt;strong&gt;too similar&lt;/strong&gt;, they cause &lt;strong&gt;multicollinearity&lt;/strong&gt;, which confuses models (especially regression).&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;height&lt;/code&gt; and &lt;code&gt;total_floors&lt;/code&gt; → correlation 0.95&lt;/li&gt;
&lt;li&gt;They’re giving the same information.&lt;/li&gt;
&lt;li&gt;You keep &lt;strong&gt;only one&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes your model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simpler&lt;/li&gt;
&lt;li&gt;faster&lt;/li&gt;
&lt;li&gt;less noisy&lt;/li&gt;
&lt;li&gt;more stable&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;C. The Big Warning: Correlation Only Catches &lt;em&gt;Linear&lt;/em&gt; Relationships&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If a feature has a non-linear relationship with the target, correlation may say &lt;strong&gt;“0”&lt;/strong&gt;, even when the feature is useful.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Predicting salary based on experience — relationship grows but flattens → &lt;strong&gt;non-linear curve&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low correlation does not mean useless feature.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsesiz9q02o1dqzqz60c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsesiz9q02o1dqzqz60c.png" alt="High vs Low Correaltion" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best practice:&lt;/strong&gt;&lt;br&gt;
Include the feature anyway and check feature importance using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random Forest&lt;/li&gt;
&lt;li&gt;XGBoost&lt;/li&gt;
&lt;li&gt;SHAP values&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3. Variance — How Spread Out the Data Is&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Variance tells you &lt;strong&gt;how much the values are spread from the average&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low variance → values are almost the same&lt;/li&gt;
&lt;li&gt;High variance → wide variety of values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Values&lt;/th&gt;
&lt;th&gt;Variance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;50, 50, 50, 50&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10, 80, 120, 200&lt;/td&gt;
&lt;td&gt;Very high&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In feature selection:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features with extremely low variance (almost constant features) should be removed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7mclopt21oymt98rn7ur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7mclopt21oymt98rn7ur.png" alt="Variance graph" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A column with 99% “No” and 1% “Yes”&lt;/li&gt;
&lt;li&gt;Gives almost no information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is called &lt;strong&gt;low-variance filtering&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;4. Standard Deviation — The More Interpretable Version of Variance&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Standard deviation (SD) is the &lt;strong&gt;square root of variance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why do we use SD?&lt;/p&gt;

&lt;p&gt;Because SD is in the &lt;strong&gt;same units as the data&lt;/strong&gt;, so it’s easier to interpret.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Variance = 2500&lt;/li&gt;
&lt;li&gt;SD = 50
SD = “On average, values are 50 units away from the mean.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In data science:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High SD → more spread&lt;/li&gt;
&lt;li&gt;Low SD → less spread&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SD is important in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;normal distribution&lt;/li&gt;
&lt;li&gt;Z-score normalization&lt;/li&gt;
&lt;li&gt;outlier detection&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;5. Practical Use Cases in Real Data Science&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A. Feature Engineering&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Remove highly correlated features&lt;/li&gt;
&lt;li&gt;Keep features that correlate with the target&lt;/li&gt;
&lt;li&gt;Remove low-variance features&lt;/li&gt;
&lt;li&gt;Treat outliers using SD&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;B. Model Stability (Regression Models)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;High correlation among features (multicollinearity):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inflates coefficients&lt;/li&gt;
&lt;li&gt;makes the model unstable&lt;/li&gt;
&lt;li&gt;reduces interpretability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Solution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correlation matrix&lt;/li&gt;
&lt;li&gt;Variance Inflation Factor (VIF)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;C. Detecting Outliers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Using SD:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Any value &amp;gt; 3 SD from the mean is often considered an outlier
This helps clean the dataset before modeling.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;D. Normalization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Z-score = (value – mean) ÷ SD&lt;br&gt;
Used heavily in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KNN&lt;/li&gt;
&lt;li&gt;SVM&lt;/li&gt;
&lt;li&gt;Gradient descent-based models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because these models depend on &lt;strong&gt;distance&lt;/strong&gt;, standardization is essential.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;6. Quick Summary Table&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Why It Matters for Feature Selection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Correlation&lt;/td&gt;
&lt;td&gt;How two features move together&lt;/td&gt;
&lt;td&gt;Helps identify useful or redundant features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variance&lt;/td&gt;
&lt;td&gt;How spread out the data is&lt;/td&gt;
&lt;td&gt;Remove near-constant features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standard Deviation&lt;/td&gt;
&lt;td&gt;Average spread from the mean&lt;/td&gt;
&lt;td&gt;Used in scaling and outlier detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High Feature-to-Target Correlation&lt;/td&gt;
&lt;td&gt;Strong predictor&lt;/td&gt;
&lt;td&gt;Keep it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High Feature-to-Feature Correlation&lt;/td&gt;
&lt;td&gt;Redundant&lt;/td&gt;
&lt;td&gt;Remove one&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Low Correlation&lt;/td&gt;
&lt;td&gt;Not always useless&lt;/td&gt;
&lt;td&gt;Check with ML model importance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;7. Final Takeaways&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use correlation to &lt;strong&gt;pick predictive features&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Remove features that are &lt;strong&gt;too similar&lt;/strong&gt; to each other.&lt;/li&gt;
&lt;li&gt;Use variance and standard deviation to spot &lt;strong&gt;boring or noisy features&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Always validate with ML models because &lt;strong&gt;correlation misses non-linear relationships&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feature selection is not just theory — it’s one of the most practical skills in data science.&lt;/p&gt;

&lt;p&gt;If you understand correlation, variance, and SD, you're already ahead.&lt;/p&gt;




&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>datascience</category>
      <category>statistics</category>
      <category>machinelearning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Statistics Day5: The Super-Simple Guide to Random Variables and Correlation for Data Science Beginners</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Wed, 19 Nov 2025 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/statistics-day5-the-super-simple-guide-to-random-variables-and-correlation-for-data-science-1e8d</link>
      <guid>https://dev.to/brains_behind_bots/statistics-day5-the-super-simple-guide-to-random-variables-and-correlation-for-data-science-1e8d</guid>
      <description>&lt;p&gt;If you’re learning statistics for data science, you’ll hear words that sound very big: &lt;em&gt;random variables&lt;/em&gt;, &lt;em&gt;PDF&lt;/em&gt;, &lt;em&gt;correlation&lt;/em&gt;, and more.&lt;/p&gt;

&lt;p&gt;But don’t worry.&lt;br&gt;
Today, we’ll break everything down in simple language so even a 10-year-old can follow.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Random Variable?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;random variable&lt;/strong&gt; is just a number that comes from a random activity.&lt;/p&gt;

&lt;p&gt;Think of it like this:&lt;br&gt;
You do something uncertain → you get a number as a result.&lt;/p&gt;

&lt;p&gt;Example: Roll a dice → you get 1, 2, 3, 4, 5, or 6.&lt;br&gt;
That number is your &lt;em&gt;random variable&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;There are two types:&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Discrete Random Variables
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Discrete&lt;/strong&gt; means you can &lt;em&gt;count&lt;/em&gt; the possible values.&lt;br&gt;
They come in separate chunks — no in-between values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Number of chocolates in a box (you can’t have 4.6 chocolates)&lt;/li&gt;
&lt;li&gt;Number of students absent&lt;/li&gt;
&lt;li&gt;Dice outcome (1–6)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it matters in data science?&lt;/strong&gt;&lt;br&gt;
You use discrete random variables when your feature takes clear, countable values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm191xav06a32xftdciz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm191xav06a32xftdciz.png" alt="demonstration of discrete and continous random variables" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Continuous Random Variables
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Continuous&lt;/strong&gt; means the values can be &lt;em&gt;anything&lt;/em&gt; in a range — even decimals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Height (160.25 cm is possible)&lt;/li&gt;
&lt;li&gt;Temperature (34.7°C, 34.75°C…)&lt;/li&gt;
&lt;li&gt;Weight&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it matters?&lt;/strong&gt;&lt;br&gt;
Many ML models assume continuous data follows patterns like the &lt;strong&gt;normal distribution&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Normal Distribution?
&lt;/h2&gt;

&lt;p&gt;A normal distribution is the famous &lt;strong&gt;bell-shaped curve&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6swen6pj7wym9fe59fp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6swen6pj7wym9fe59fp.png" alt="Normal distribution" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It looks like a hill that is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;highest in the middle&lt;/li&gt;
&lt;li&gt;smooth&lt;/li&gt;
&lt;li&gt;symmetric&lt;/li&gt;
&lt;li&gt;values near the mean are more common&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example: Most people’s heights cluster around an average.&lt;br&gt;
Only few are extremely short or extremely tall.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is the Probability Density Function (PDF)?
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;PDF&lt;/strong&gt; is simply a formula that tells us:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How likely is a value to appear in a continuous distribution?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For a normal distribution, the PDF looks complicated, but the meaning is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It helps us find probabilities for continuous values&lt;/li&gt;
&lt;li&gt;The highest point is at the mean (most likely)&lt;/li&gt;
&lt;li&gt;The sides go down smoothly (less likely)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You &lt;strong&gt;cannot&lt;/strong&gt; take one point and say “this value has 10% probability.”&lt;br&gt;
For continuous data, we talk about &lt;strong&gt;areas&lt;/strong&gt; under the curve.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5e5215effeis6nedxfv3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5e5215effeis6nedxfv3.png" alt="probability density function" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Think of the curve as a mountain.&lt;br&gt;
Probability = how much area lies under that mountain between two points.&lt;/p&gt;

&lt;p&gt;This helps in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;calculating confidence intervals&lt;/li&gt;
&lt;li&gt;computing z-scores&lt;/li&gt;
&lt;li&gt;understanding statistical tests&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pearson's Correlation Coefficient (r)
&lt;/h2&gt;

&lt;p&gt;Pearson’s correlation tells us:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How strongly are two numerical variables related?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It gives a number between -1 and +1:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value (r)&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;+1&lt;/td&gt;
&lt;td&gt;Perfect positive relationship&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;No linear relationship&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;Perfect negative relationship&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gt7qamsci72pw5envwl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gt7qamsci72pw5envwl.png" alt="Pearson's correlation coefficient" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Height vs weight → positive correlation&lt;/li&gt;
&lt;li&gt;Age vs toy preference → negative correlation&lt;/li&gt;
&lt;li&gt;Shoe size vs IQ → almost zero correlation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In simple terms:&lt;br&gt;
If one goes up and the other goes up too → positive.&lt;br&gt;
If one goes up and the other goes down → negative.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Use Cases
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Real-Life Use&lt;/th&gt;
&lt;th&gt;Data Science Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discrete RV&lt;/td&gt;
&lt;td&gt;Counting customers&lt;/td&gt;
&lt;td&gt;Classification features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continuous RV&lt;/td&gt;
&lt;td&gt;Measuring weight or speed&lt;/td&gt;
&lt;td&gt;Regression, clustering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDF&lt;/td&gt;
&lt;td&gt;Finding chances in continuous data&lt;/td&gt;
&lt;td&gt;Hypothesis testing, probability models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pearson Correlation&lt;/td&gt;
&lt;td&gt;See if two things are linked&lt;/td&gt;
&lt;td&gt;Feature selection, EDA&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  When Are These Useful in Machine Learning?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Feature Engineering
&lt;/h3&gt;

&lt;p&gt;Correlation helps detect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;predictive features&lt;/li&gt;
&lt;li&gt;multicollinearity (when features are too similar)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Understanding Your Dataset
&lt;/h3&gt;

&lt;p&gt;Random variables and distributions help decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which visualization to use&lt;/li&gt;
&lt;li&gt;Which model suits the data&lt;/li&gt;
&lt;li&gt;Whether scaling/normalization is required&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Statistical Testing
&lt;/h3&gt;

&lt;p&gt;PDF + normal distribution help compute:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;z-scores&lt;/li&gt;
&lt;li&gt;p-values&lt;/li&gt;
&lt;li&gt;confidence intervals&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Simple Examples to Lock the Concepts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example 1: Discrete
&lt;/h3&gt;

&lt;p&gt;Number of pets in a house:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0,1,2,3…
Countable. No decimals.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example 2: Continuous
&lt;/h3&gt;

&lt;p&gt;Time taken to run 100 meters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;12.5s, 12.51s, 12.512s
Infinite possibilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example 3: Pearson Correlation
&lt;/h3&gt;

&lt;p&gt;Study time vs test score → high positive&lt;br&gt;
Ice cream sales vs temperature → positive&lt;br&gt;
Mobile use vs sleep → negative&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>statistics</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Statistics Day 4: Z-Score vs Min-Max Normalization — Making Data Fair for ML Models</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Sat, 15 Nov 2025 09:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/statistics-day-4-z-score-vs-min-max-normalization-making-data-fair-for-ml-models-1plc</link>
      <guid>https://dev.to/brains_behind_bots/statistics-day-4-z-score-vs-min-max-normalization-making-data-fair-for-ml-models-1plc</guid>
      <description>&lt;p&gt;Welcome back to the &lt;strong&gt;Statistics Challenge for Data Scientists!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Today, we’re learning something that makes our data fair — &lt;strong&gt;Normalization&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Normalization?
&lt;/h2&gt;

&lt;p&gt;Imagine you and your friend are running a race.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You run &lt;strong&gt;100 meters&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Your friend runs &lt;strong&gt;1 kilometer (1000 meters)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Can we directly compare who runs faster?&lt;br&gt;
Not really — because the &lt;strong&gt;units and scales are different.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s exactly what happens with data — some numbers are small (like &lt;em&gt;age&lt;/em&gt;), and some are huge (like &lt;em&gt;salary&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Normalization&lt;/strong&gt; means scaling data so that all values fit into a similar range and can be compared fairly.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Do We Need Normalization?
&lt;/h2&gt;

&lt;p&gt;Think of a teacher giving marks to students:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Math score: 100 marks&lt;/li&gt;
&lt;li&gt;Science score: 50 marks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we add them directly, Math will dominate because its maximum is higher.&lt;/p&gt;

&lt;p&gt;To treat both subjects fairly, we scale the marks — that’s &lt;strong&gt;normalization&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In data science, normalization helps machine learning models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Work faster&lt;/li&gt;
&lt;li&gt;Learn better&lt;/li&gt;
&lt;li&gt;Give fair importance to each feature&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Two Popular Normalization Methods
&lt;/h2&gt;

&lt;p&gt;Let’s understand the two most common types — &lt;strong&gt;Min-Max Normalization&lt;/strong&gt; and &lt;strong&gt;Z-Score Normalization&lt;/strong&gt;.&lt;/p&gt;


&lt;h3&gt;
  
  
  1. Min-Max Normalization (Feature Scaling)
&lt;/h3&gt;

&lt;p&gt;It squeezes all data values between &lt;strong&gt;0 and 1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X' = (X - Xmin) / (Xmax - Xmin)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Let’s say we have ages: 10, 20, 30, 40, 50.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Minimum = 10&lt;/li&gt;
&lt;li&gt;Maximum = 50&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For age = 30&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X' = (30 - 10) / (50 - 10) = 20 / 40 = 0.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, the normalized value is &lt;strong&gt;0.5&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmy1onaiz1gm60ac4zf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmy1onaiz1gm60ac4zf2.png" alt="Min-max normalization demonstration" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to Use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When your data has a fixed range (like 0 to 100 marks).&lt;/li&gt;
&lt;li&gt;Best for algorithms that depend on distance (like KNN, K-Means, Neural Networks).&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Z-Score Normalization (Standardization)
&lt;/h3&gt;

&lt;p&gt;This method centers the data around &lt;strong&gt;mean = 0&lt;/strong&gt; and &lt;strong&gt;standard deviation = 1&lt;/strong&gt;.&lt;br&gt;
It shows &lt;strong&gt;how far each value is from the average&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Z = (X - μ) / σ
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;μ&lt;/strong&gt; = Mean of the data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;σ&lt;/strong&gt; = Standard deviation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Let’s say heights (in cm): 150, 160, 170, 180, 190&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mean (μ) = 170&lt;/li&gt;
&lt;li&gt;Standard deviation (σ) = 14.14&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For height = 150&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Z = (150 - 170) / 14.14 = -1.41
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, 150 cm is &lt;strong&gt;1.41 standard deviations below the mean&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ndk9n3gqo7p6uuke7ei.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ndk9n3gqo7p6uuke7ei.png" alt="z-score normalization demonstration" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to Use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When data doesn’t have a fixed range.&lt;/li&gt;
&lt;li&gt;Works well with algorithms assuming normal distribution (like Linear Regression, Logistic Regression, PCA).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Min-Max vs Z-Score — Quick Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Min-Max Normalization&lt;/th&gt;
&lt;th&gt;Z-Score Normalization&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Range&lt;/td&gt;
&lt;td&gt;0 to 1&lt;/td&gt;
&lt;td&gt;Can be negative or positive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Depends on&lt;/td&gt;
&lt;td&gt;Min &amp;amp; Max values&lt;/td&gt;
&lt;td&gt;Mean &amp;amp; Standard Deviation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sensitive to outliers&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Less sensitive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Bounded data (e.g. exam scores)&lt;/td&gt;
&lt;td&gt;Unbounded data (e.g. height, salary)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Normalization&lt;/strong&gt; makes data fair by bringing all features to a similar scale.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Min-Max&lt;/strong&gt; when data has clear limits (like percentages).&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Z-Score&lt;/strong&gt; when data spreads freely and you care about distance from average.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Recap Example
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Original Value&lt;/th&gt;
&lt;th&gt;Min-Max (0-1)&lt;/th&gt;
&lt;th&gt;Z-Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;-1.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;+1.41&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;In short:&lt;/strong&gt;&lt;br&gt;
Normalization is like giving everyone the same playing field so that your machine learning model doesn’t play favorites!&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>statistics</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Statistics Day 3: Understanding P-Value — The Heart of Hypothesis Testing</title>
      <dc:creator>Chanchal Singh</dc:creator>
      <pubDate>Fri, 14 Nov 2025 05:30:00 +0000</pubDate>
      <link>https://dev.to/brains_behind_bots/statistics-day-3-understanding-p-value-the-heart-of-hypothesis-testing-4l4p</link>
      <guid>https://dev.to/brains_behind_bots/statistics-day-3-understanding-p-value-the-heart-of-hypothesis-testing-4l4p</guid>
      <description>&lt;p&gt;Have you ever tried to prove a point to your friends?&lt;br&gt;
Maybe you said — “I think this coin is magic! It always lands on heads!”&lt;/p&gt;

&lt;p&gt;Your friends would say — “Really? Let’s test it!”&lt;/p&gt;

&lt;p&gt;That’s kind of how data scientists use P-Value — to check if something is truly special or just luck.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: The Simple Idea
&lt;/h2&gt;

&lt;p&gt;P-Value helps us decide whether what we see in data is real or just a coincidence.&lt;/p&gt;

&lt;p&gt;Let’s say you flip a coin 10 times.&lt;br&gt;
It lands on heads 9 times. 😮&lt;/p&gt;

&lt;p&gt;Now you wonder — “Is this coin really unfair, or did I just get lucky?”&lt;/p&gt;

&lt;p&gt;That’s when P-Value comes in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkllz2h13aefci8b8yvi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkllz2h13aefci8b8yvi.png" alt="evaluating p-value using coin tossing" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: How P-Value Works
&lt;/h2&gt;

&lt;p&gt;Imagine a little helper called P-Val, who whispers to you how “surprising” your result is.&lt;/p&gt;

&lt;p&gt;If your coin result is...   P-Val says...   What it means&lt;br&gt;
Very normal (like 5 heads, 5 tails) “That’s common!”  Nothing special here&lt;br&gt;
A bit unusual (like 7 heads, 3 tails)   “Hmm, slightly surprising.” Could be luck&lt;br&gt;
Super weird (like 9 heads, 1 tail)  “Whoa! That’s rare!”  Maybe the coin is unfair&lt;/p&gt;

&lt;p&gt;So, the smaller the P-Value, the more unusual your result is — and the more likely you’ve found something real!&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: The Magic Number — 0.05
&lt;/h2&gt;

&lt;p&gt;Scientists often use 0.05 (5%) as a magic line.&lt;/p&gt;

&lt;p&gt;P-Value What We Decide&lt;br&gt;
Less than 0.05  “Wow! Probably something real happening here!”&lt;br&gt;
More than 0.05  “Hmm, might just be luck.”&lt;/p&gt;

&lt;p&gt;So if your P-Value is 0.03, you’d say —&lt;br&gt;
👉 “This is rare! Maybe my coin is really unfair.”&lt;/p&gt;

&lt;p&gt;But if it’s 0.20, you’d say —&lt;br&gt;
👉 “That’s not rare enough. Probably just chance.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pby8f20l2nq4qjptksf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pby8f20l2nq4qjptksf.png" alt="p-vlaue description" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: In Technical Terms
&lt;/h2&gt;

&lt;p&gt;Null Hypothesis (H₀) = Nothing special happening.&lt;/p&gt;

&lt;p&gt;Alternative Hypothesis (H₁) = Something special happening.&lt;/p&gt;

&lt;p&gt;P-Value tells us how likely our data would be if H₀ (nothing special) was actually true.&lt;/p&gt;

&lt;p&gt;So when P-Value is tiny, it means our result is too rare to be just chance, so we reject H₀.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Real-Life Example
&lt;/h2&gt;

&lt;p&gt;Let’s say a company says —&lt;br&gt;
“Our new cookie recipe makes people 10% happier!” 🍪😁&lt;/p&gt;

&lt;p&gt;We test it on 100 people.&lt;br&gt;
If the P-Value comes out less than 0.05, it means —&lt;br&gt;
→ The happiness difference is real, not just random luck.&lt;/p&gt;

&lt;p&gt;If it’s higher than 0.05,&lt;br&gt;
→ Maybe the cookies are tasty… but not that special. 😅&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0ddrcmyjab5s5j0tdly.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0ddrcmyjab5s5j0tdly.png" alt="p-value demonstration by graph" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;P-Value Tells how surprising your result is&lt;br&gt;
Small P-Value (&amp;lt; 0.05)  Rare → probably something real&lt;br&gt;
Big P-Value (&amp;gt; 0.05)    Common → probably just luck&lt;br&gt;
Helps with  Deciding if your finding is real or coincidence&lt;/p&gt;




&lt;h2&gt;
  
  
  🧭 Final Thought
&lt;/h2&gt;

&lt;p&gt;Think of P-Value like a surprise meter.&lt;br&gt;
It doesn’t prove anything 100%, but it helps you know whether your data is whispering “hey, look deeper!” or “nah, just a coincidence.”&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connect on Linkedin: &lt;a href="https://www.linkedin.com/in/chanchalsingh22/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/chanchalsingh22/&lt;/a&gt; &lt;br&gt;
Connect on YouTube: &lt;a href="https://www.youtube.com/@Brains_Behind_Bots" rel="noopener noreferrer"&gt;https://www.youtube.com/@Brains_Behind_Bots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>statistics</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
